diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000000..e69de29bb2 diff --git a/404.html b/404.html new file mode 100644 index 0000000000..fbe3c2047a --- /dev/null +++ b/404.html @@ -0,0 +1,6641 @@ + + + + + + + + + + + + + + + + + + + ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ +

404 - Not found

+ +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/accounts-and-access/accounts-and-access-faqs/index.html b/account-project-management/accounts-and-access/accounts-and-access-faqs/index.html new file mode 100644 index 0000000000..325c5c98e5 --- /dev/null +++ b/account-project-management/accounts-and-access/accounts-and-access-faqs/index.html @@ -0,0 +1,6875 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Accounts and Access FAQs - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Accounts and Access FAQ

+

How do I request a new project/allocation?

+

There are 3 allocation opportunities at ALCF. Please see How to Get an Allocation on how to get time on our systems.

+

Who do I contact if my Discretionary Project Allocation expires or if I need to request additional hours?

+

To request an extension of your existing discretionary allocation or to request additional hours, please email support@alcf.anl.gov with answers to the following or fill out the form at request an extension/additional hours: +- What you have accomplished with your original allocation? + - Please include a brief description of any publications or major presentations that were (or will be) generated in full or in part because of this allocation. +- What you will do with the extra time? +- What you are requesting as your new expiration date? +- How many additional hours you are requesting?

+

How do I join a project?

+

To join a project, please go to https://accounts.alcf.anl.gov, then click "join a project". Once there, scroll down to the project you want to join and click on it. At the bottom of the next page, please click on the "Request Membership" button. Once we receive approval from the PI regarding your membership request, we will provide you with access to the necessary resources.

+

How do I request a reservation?

+

Reservation requests must include information detailed here:

+ +

How do I apply for a new account?

+

Note: All ALCF accounts must be associated with an allocated project.

+ +

What do I do when my ALCF account expires?

+

Please forward your account expiry email to your Sponsor. As soon as we receive an approval email from your Sponsor, we'll proceed with your account renewal process as needed.

+

What do I do when I receive a warning that my 593 has expired / is about to expire?

+

If you are planning to extend this assignment/computer user account, please let us know, so a new 593 (Foreign Visit & Assignment Request form) will be filed for you using the information from before. In case any other documents are needed from your end, you'll be contacted as necessary. In order to allow sufficient time for an indices check, it is recommended that your response be submitted as soon as possible.

+

If you are not planning to extend your account, also let us know so that we may close out your records.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/accounts-and-access/alcf-passcode-tokens/index.html b/account-project-management/accounts-and-access/alcf-passcode-tokens/index.html new file mode 100644 index 0000000000..dcc9780ae4 --- /dev/null +++ b/account-project-management/accounts-and-access/alcf-passcode-tokens/index.html @@ -0,0 +1,7099 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Passcode Tokens - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

ALCF Passcode Tokens

+

Please note: An account can be associated with a single token only (Mobile or Physical token). Please contact accounts@alcf.anl.gov to change your token preference.

+

Mobile Token

+

The SafeNet MobilePass+ Mobile Token allows access to ALCF systems. This security mobile token uses one-time passwords combined with your PIN for controlled access to the login systems. The mobile token utilizes an app that is keyed to your user account and for which you are responsible on your Android, iPhone or Windows mobile device. Please safeguard your phone as you would your credit cards or house keys: Do not store username, PIN, or other account-related records with the token. Sharing of mobile tokens is strictly forbidden. A mobile token can be associated with a single device only.

+

Step 1. Download the SafeNet MobilePass+ app for your device:

+

The SafeNet MobilePASS+ app turns your mobile phone into a two-factor authentication device, removing the need to carry an additional hardware token. As a SafeNet MobilePASS+ user, you can generate passcodes on your mobile device and use those passcodes to authenticate on ALCF computing resources. See supported OS and platforms for more information.

+

SafeNet MobilePass+ for Android can be found here: https://play.google.com/store/apps/details?id=com.gemalto.mpassplus

+

SafeNet MobilePass+ for iPhone can be found here: https://itunes.apple.com/us/app/safenet-mobilepass/id1056481326?mt=8

+

SafeNet MobilePass+ for Windows can be found here: https://www.microsoft.com/en-us/p/safenet mobilepass/9nblggh10pdq?activetab=pivot%3Aoverviewtab

+

Step 2. Enroll your MobilePass+ mobile token:

+

After you’ve been provisioned a mobile token, you will receive a notification email with the subject line "ALCF Mobile Token Self-Enrollment" which you must access from your mobile phone.

+

Auto-Enrollment (to enroll SafeNet MobilePass+ token automatically):

+
    +
  1. Click on the http:// link in the email. The SafeNet Authentication Service Self-Enrollment will open.
  2. +
  3. Click enroll your SafeNet MobilePass+ token.
  4. +
  5. When prompted to open in MobilePass+ tap Open.
  6. +
  7. You will now be prompted to enter a 6 digit all numeric PIN.
  8. +
  9. Enter your PIN in the Token PIN field and repeat in the Confirm PIN field.
  10. +
  11. You will be taken to the Enrollment Complete screen to name the token.
  12. +
  13. Insert the desired name in the Token Name field or leave it as is. This name is not utilized by the server; it is for you only.
  14. +
  15. The newly enrolled SafeNet MobilePass+ token is now displayed in the SafeNet MobilePass+ app.
  16. +
+

Manual Enrollment:

+
    +
  1. Copy the activation string from the SafeNet provision email.
  2. +
  3. Open the SafeNet MobilePass+ app and tap the manual option.
  4. +
  5. Paste the enrollment string into the field provided and tap the Enroll button.
  6. +
  7. You will now be prompted to enter a 6 all numeric PIN.
  8. +
  9. Enter your PIN in the Token PIN field and repeat in the Confirm PIN field.
  10. +
  11. You will be taken to the Enrollment Complete screen to name the token.
  12. +
  13. Insert the desired name in the Token Name field or leave it as is. This name is not utilized by the server; it is for you only.
  14. +
+

Logging in to an ALCF System using a Mobile Token

+
    +
  1. Open the MobilePASS+ (MobilePASS for Windows) app on your device. Then initiate an SSH session and type the following:
  2. +
+
ssh <ALCF username>@<system_name>.alcf.anl.gov
+
+
    +
  1. +

    When prompted for a password, click the SafeNet MobilePASS+ app on your phone. Click on the token name listed within the app, and enter your PIN.

    +
  2. +
  3. +

    The app will display your passcode immediately. Enter the passcode as the login password for the system within the SSH session. Please Note: You do NOT have to enter the PIN on the SSH screen when logging into a resource. This only needs to be done to access the passcode within the SafeNet MobilePASS+ (MobilePASS for Windows) app.

    +
  4. +
  5. +

    Each generated passcode is valid on the SafeNet MobilePass+ app window until your mobile device screen times out.

    +
  6. +
+

Troubleshooting your Mobile Token

+

Case 1: Forgotten PIN: If you enter a PIN for your mobile Token and you get an invalid PIN, you will be asked to re-enter your PIN. After 6 failed attempts your token will be deleted and you will need to call the ALCF help desk or send an email to ALCF support to have a new mobile token provisioned.

+

Case 2: Account Lockout: If you fail to enter the correct password 6 times, you will get a permission denied error on the SSH screen. Upon 4 more failed attempts, your IP will be blocked. You will need to call the ALCF help desk and submit a ticket to have the IP unblocked.

+

Case 3: PIN Change: While logged in to the mobile token, click on token settings then tap change PIN. Enter the current PIN followed by the new PIN and confirm.

+

Case 4: Re-Sync: If you are unable to log in to a resource after entering the correct PIN and passcode your token may be out of sync with the server. Please email ALCF Service Desk at support at alcf.anl.gov for assistance.

+

Case 5: New Mobile Device: If you have a new mobile device, please email the ALCF Service Desk at support at alcf.anl.gov to have a new mobile token provisioned.

+

Physical Token

+

The physical token allows access to the ALCF systems. This security token uses one-time passwords combined with your PIN for controlled access to the login systems. The physical token is a tracked asset for which you are responsible and is keyed to your use. Please safeguard your token as you would your credit cards or house keys: Do not store username, PIN, or other account-related records with the token. Sharing of tokens is strictly forbidden. Please do not mark on the token or alter it in any way.

+

Enabling Your ALCF Physical Token

+

Upon receipt of CRYPTOCard token, contact support@alcf.anl.gov must verify identity and activate token. If this step is not performed the CRYPTOCard token will not be able to log on to the ALCF resource.

+

ALCF Support Desk Info +Hours: Monday-Friday 9 a.m. - 5 p.m. (Central time); +Email: support@alcf.anl.gov

+

Logging in to an ALCF System using a Physical Token

+

When the physical token is activated, an initial PIN will be provided. This will be a four-digit number that will prepend to the one-time password string generated by the token.

+

Upon INITIAL login (to one of the ALCF machines), a prompt to change the PIN will appear. PINs must be at least four characters long and must only contain numbers.

+
    +
  1. +

    Initiate an SSH session using: +

    ssh <ALCF username>@<system_name>.alcf.anl.gov
    +

    +
  2. +
  3. +

    A password prompt will be received. At this point, push the button on the physical token once.

    +
  4. +
  5. +

    An eight-character, one-time password made up of letters and numbers will appear on the token’s display. This one-time password is case-sensitive.

    +
  6. +
  7. +

    Type your PIN followed immediately by the one-time password at the SSH password prompt.

    +
  8. +
+

For example, if your PIN is 1234 and you received the one-time password string ABCD9876, you would type 1234ABCD9876 at the password prompt.

+

Troubleshooting Your Physical Token

+

Case 1: It says "locked": The physical token may be locked due to too many failed attempts. Please contact the ALCF Help Desk to return the defective token and so a replacement can be sent.

+

Case 2: You have a PIN for your physical token: Once a PIN has been set for your physical token, you will need to prepend your PIN to the token password. Otherwise you will not be able to log in. If you do not remember your PIN, please email us so we can verify your identity and reset your Initial PIN.

+

Case 3: It does not say "locked" but still does not work: It is likely that your token has fallen out of sync with the server. If you have pushed the button on your physical token more than 10 times without successfully logging in, it will fail to authenticate because it has lost synchronization with the server. Please try connecting to Theta first. If it still fails, please follow the re-sync instructions below.

+

Re-Sync Instructions

+

If you have pushed the button on your physical token more than 10 +times, it will fail to authenticate because it has lost synchronization +with the server. You can re-synchronize your token using the following procedure:

+
1. Have your physical token ready.
+
+2. Obtain a challenge sequence:
+    - Initiate an SSH session to a host that allows token
+      authentication (such as theta.alcf.anl.gov). At the password
+      prompt, just hit 'Enter'. This will cause the Cryptocard service
+      to produce a challenge string consisting of 8 numbers.
+
+3. Hold down the button on your token for a few seconds until the
+    display says "Init", then let go.
+
+4. The token will scroll through a series of menu options. When it
+    displays "ReSync", hit the button again.
+
+5. The display will say
+
+     Resync?0
+
+6. The number at the end will start cycling from 0 to 9, over and over.
+
+7. Look at the numbers in your challenge string. When the number
+    displayed on your token changes to the first number of the challenge
+    string, press the button. The display will now show this number, and
+    the second digit will start cycling.
+
+8. Enter each of the numbers from your challenge string in the same
+    manner, until the display on your token matches the entire challenge string.
+    Choose the "<" to backspace and re-enter the previous number if
+    necessary.
+
+9. Once you've entered all 8 digits, re-check to make sure they're
+    accurate. Then, while all 8 digits are displayed on the token, press
+    the button to generate a new password.
+
+10. Enter your PIN followed by the new password, and hit 'Enter'. 
+     If successful, you will be logged in to the resource. You're now back 
+     in sync with the authentication server.
+
+If you are unsuccessful, you will be presented with another challenge string. 
+At this point, you may need to perform the re-sync instructions again.
+
+

If there are still problems after completing the re-synchronization procedures, please email us at support@alcf.anl.gov so we can run a test on the physical token to determine if it is defective.

+

If it is found to be defective we will promptly replace it. Physical tokens are the property of Argonne National Laboratory.

+

Please return them to us at:

+
ALCF Help Desk
+Argonne National Laboratory
+9700 S. Cass Ave.
+Bldg. 240, Rm. 2129
+Lemont, IL 60439
+
+

Resetting the Physical Token PIN

+

Please email us at support at alcf.anl.gov for PIN resets. Once your identity has been verified, we will provide you with a new PIN for your CRYPTOcard token.

+

Returning a Physical Token

+

If you no longer need your physical token, please return it to this address:

+
ALCF Help Desk
+Argonne National Laboratory
+9700 S. Cass Ave.
+Bldg. 240, Rm. 2129
+Lemont, IL 60439
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/accounts-and-access/user-account-overview/index.html b/account-project-management/accounts-and-access/user-account-overview/index.html new file mode 100644 index 0000000000..5414e332a4 --- /dev/null +++ b/account-project-management/accounts-and-access/user-account-overview/index.html @@ -0,0 +1,6815 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Accounts and Access - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

ALCF User Account Overview

+

All computing carried out on the ALCF systems is associated with a user "account." This account is used to log onto the login servers and run jobs on the resources. If someone has a user account, then he or she has a login name that is recorded in the user database. This web page describes the process that users will need to understand to manage account details, including policies and procedures.

+

If you need an account, visit the Accounts and Project Management website: Request an account

+

If you want to learn how to get started, visit the Get Started Guide: Get Started Guide

+

Who Can Get an Account

+

Those who are interested in having an account on a ALCF resource must first request an allocation and provide a detailed description of the work, including computational requirements and coding capabilities for the Blue Gene platform. Another means of acquiring an allocation on the ALCF system is to be part of a project team that already has an active allocation. Once an allocation has been granted, new users should complete an account request. A project’s Principal Investigator (PI) must sponsor these accounts—if the PI is the user, an ALCF staff member must serve as sponsor. Sponsors are asked annually to evaluate the accounts they have sponsored to determine whether or not these accounts should be kept active.

+

Account Abilities

+

A user with an active account can login to the ALCF login servers (e.g theta.alcf.anl.gov or cooley.alcf.anl.gov.) This account will have some home directory space, where file transfer can occur from that space via the login nodes, and where development activities, such as editing and compiling, can also occur.

+

Account States

+

Accounts are classified in one of the following categories:

+
    +
  • Pending: An account that has been requested but has not yet been created.
  • +
  • Active: An account that can be used to interact with the ALCF Login Servers. This is the normal state for all accounts.
  • +
  • Inactive: An account that still exists on the system (that is, the account continues to be registered in the database and the user's files exist on disk) but the user cannot interact with the ALCF Login Servers. An account might be disabled due to misuse, security concerns, or because it is no longer allocated.
  • +
  • Deleted: An account that existed on the system and is thus in the records and backups, but whose user no longer has access to the systems or files on disk.
  • +
+

More Information

+ + + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/allocation-management/index.html b/account-project-management/allocation-management/allocation-management/index.html new file mode 100644 index 0000000000..d324a9319d --- /dev/null +++ b/account-project-management/allocation-management/allocation-management/index.html @@ -0,0 +1,6866 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Managing Your Allocations - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Managing Your Allocations

+

Allocations require management – balance checks, resource allocation, requesting more time, etc.

+

Checking for an Active Allocation

+

To determine if there is an active allocation, check Job Submission.

+

For information on how to run the query, look at our documentation on our sbank Allocations Accounting System or email support@alcf.anl.gov and ask for all active allocations.

+

Using sbank to Determine the Balance of an Allocation

+

To determine which platforms have an active balance, check our allocation accounting system sbank.

+
    +
  • To obtain the allocation balance, check the sbank command sbank-list-allocations.
  • +
  • DD projects with a negative balance will not be able to run jobs until they have requested additional time, see Getting more time below.
  • +
  • INCITE and ALCC PIs automatically email a summary of project usage. If this is a DD project, please email support@alcf.anl.gov.
  • +
+

Allocation Expiration

+

Projects and allocations at the ALCF are different. A particular project might have multiple allocations of time. For example, a discretionary project that has been approved for more than 3 times will have 3 allocations (2 are probably expired) but just one project. Projects will not expire, allocations will. If allocations are expired, or have no hours left, jobs will not be able to run. Use the two bullets above (Checking for an active allocation and Determining the balance of an allocation) to determine active allocations.

+

Getting More Time

+

To request an extension of your existing discretionary allocation or to request additional hours, please email support@alcf.anl.gov with answers to the following:

+
    +
  • What you have accomplished with your original allocation?
  • +
  • Please include a brief description of any publications or major presentations that were (or will be) generated in full or in part because of this allocation.
  • +
  • What you will do with the extra time?
  • +
  • What you are requesting as your new expiration date?
  • +
  • How many additional hours you are requesting?
  • +
+

Sub-allocations

+

Suballocations let PIs control who in their team can runs jobs, how much they are allowed to consume (allocation amount), and when they are allowed to run jobs (start and end dates)

+

Step 1: Create Suballocations (Project PI):

+

PI creates suballocations

+

sbank new sub <allocationid> **-name <nameofsuballoc>

+

Tip: see sbank new suballocation -h for all the options.

+

Step 2: Manage Suballocations (Project PI)

+

PI adds users to suballocations

+

sbank e sub <projectname>::<nameofsuballoc> --add-user="<username1> <username2> ..."

+

PI can change the name of a suballocation

+

sbank e sub <suballocationID> --name=<new_name_of_suballocation>

+

By default, the primary suballocation (which is the default suballocation created when the allocation is created by ALCF) is unrestricted .i.e. enabled for all project members. That means all project members can submit jobs against the primary suballocation by default. All other suballocations are restricted by default and users have to be added for each of them.

+

To change the default for the primary suballocation to restrict usage, PI must first edit the suballocation:

+

sbank-edit-suballocation --restrict <primary suballocation id>

+

Then add users with this command:

+

sbank e sub <primary suballocation id> --add-user="<username1> <username2> ..."

+

PI changes start and end dates for a suballocation:

+

sbank e sub <suallocationID> -S <start_date> -E <end_date>

+

PI adds hours to a suballocation:

+

sbank e sub <projectname>::<nameofsuballoc> --hours-to-move <hours> --to-suballocation <projectname>::<nameofsuballoc2>

+

Note: hourstomove must be greater than or equal to the available balance for the suballocation nameofsuballoc

+

Tip: see sbank e suballocation -h for all the options

+

Step 3: Submit Jobs (Project team)

+

Submit jobs to a suballocation. Note that the user should be on the suballocation’s user list

+

Eg: qsub -l select=10,walltime=30:00,filesystems=grand:home -A <suballoctionID> -q demand test.sh

+

Note: Once submanagement is enabled for a project allocation, all job submissions must specify the suballocationID

+

Useful commands: +List all suballocations for a project that shows number of jobs run, charges, allocation balance, suballocation name, and list of users

+

sbank-list-allocations -r polaris -p <projectname> -f”+subname users_list”

+

Tip: see sbank l a -h for all the options and sbank –f\? for list of fields that can be displayed

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/files/allocation.png b/account-project-management/allocation-management/files/allocation.png new file mode 100644 index 0000000000..e54b2bd9fe Binary files /dev/null and b/account-project-management/allocation-management/files/allocation.png differ diff --git a/account-project-management/allocation-management/files/request-allocation.png b/account-project-management/allocation-management/files/request-allocation.png new file mode 100644 index 0000000000..2568ce8a2d Binary files /dev/null and b/account-project-management/allocation-management/files/request-allocation.png differ diff --git a/account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/index.html b/account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/index.html new file mode 100644 index 0000000000..dfc9b801cc --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/index.html @@ -0,0 +1,7147 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-detail-allocations - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Manpage for sbank-detail-allocations

+

sbank-detail-allocations [options] [ ... ]

+

Detail allocation information.

+

NOTE: + 1. The list of arguments are optional. + 2. you can also enter list by using the -a option multiple times. + 3. regardless, both are optional, and you can get detail allocation info using the option filters below.

+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-e EVENT_ID, --event-id=EVENT_ID

+

filter on event id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:], for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-j JOBID, --jobid=JOBID

+

filter on jobid

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID

+

filter on transaction id

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-w "FIELD_INFO", --field-width=

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

-E END, --end=END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

**Date Parsing Precedence: **

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions), ...

+

-I, --get-inactive

+

also get inactive allocations

+

-O, --get-only-inactive

+

only inactive allocations

+

-S START, --start=START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+

-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE

+

transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

+

--award-type-name=AWARD_TYPE_NAME

+

filter on award type name

+

--award-category=AWARD_CATEGORY

+

filter on award category

+

--cbank-ref=CBANK_REF

+

filter on Clusterbank reference id

+

--created=CREATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--get-deleted

+

also get deleted objects

+

--get-only-deleted

+

only deleted objects

+

--all-charges

+

only show list info that have charges regardless of project/user relationship

+

--history-date-range=END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+

--last-updated=LAST_UPDATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+

--no-commas

+

remove commas from comma separated thousands

+

--no-header

+

do not display the header

+

--no-history

+

do not show history information

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-detail-jobs/index.html b/account-project-management/allocation-management/not_in_nav/sbank-detail-jobs/index.html new file mode 100644 index 0000000000..72a278c769 --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-detail-jobs/index.html @@ -0,0 +1,6771 @@ + + + + + + + + + + + + + + + + + + + + + sbank-detail-jobs - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

sbank-detail-jobs

+

sbank-detail-jobs [options] [ | ... | ]

+

Detail job information. +NOTE:

+
    +
  1. The arguments or are NOT REQUIRED;
  2. +
  3. event_id is the JOB DATABASE ID;
  4. +
  5. is the SCHEDULER CREATED ID, such as Cobalt;
  6. +
  7. can also be entered using option -j ;
  8. +
  9. can also be entered using option -e ;
  10. +
  11. can also be entered using option -r ;
  12. +
  13. regardless, you can use options or arguments to get detail job information
  14. +
+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-e EVENT_ID, --event-id=EVENT_ID

+

filter on event id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:], for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-j JOBID, --jobid=JOBID

+

filter on jobid

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID

+

filter on transaction id

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-w "FIELD_INFO", --field-width

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

-E END, --end=END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le, lt, eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-S START, --start=START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le, lt, eq or >=, >,<=, <, == . Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

+

-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE

+

transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

+

--created=CREATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le, lt, eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--eligible=ELIGIBLE_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le, lt, eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

+

--get-not-charged

+

only un-charged jobs

+

--history-date-range=END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le, lt, eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

+

--last-updated=LAST_UPDATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le, lt, eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'gt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

+

--no-commas

+

remove commas from comma separated thousands

+

--no-header

+

do not display the header

+

--no-history

+

do not show history information

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+

--queued=QUEUED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le, lt, eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-detail-projects/index.html b/account-project-management/allocation-management/not_in_nav/sbank-detail-projects/index.html new file mode 100644 index 0000000000..bf0c0ae137 --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-detail-projects/index.html @@ -0,0 +1,6953 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-detail-projects - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Manpage for sbank-detail-projects

+

sbank-detail-projects [options] [ ... ]

+

Detail project information.

+

NOTE: + 1. The list of arguments are optional + 2. you can also enter list by using the -p option multiple times + 3. regardless, both are optional, and you can get detail project info using the option filters below

+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:], for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-w "FIELD_INFO", --field-width

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

-E END, --end=END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence: + - YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-I, --get-inactive

+

get inactive allocations

+

-S START, --start=START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--all-charges

+

only show list info that have charges regardless of project/user relationship

+

--no-commas

+

remove commas from comma separated thousands

+

--no-header

+

do not display the header

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/index.html b/account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/index.html new file mode 100644 index 0000000000..399f80ff5f --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/index.html @@ -0,0 +1,7070 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-detail-transactions - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Manpage for sbank-detail-transactions

+

sbank-detail-transactions [options] [ ... ]

+

Detail transaction information.

+

NOTE: + 1. The list of arguments are optional + 2. you can also enter list by using the -t option multiple times + 3. regardless, both are optional, and you can get detail transaction info using the option filters below

+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-c, --comment

+

display comment

+

-e EVENT_ID, --event-id=EVENT_ID

+

filter on event id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:] for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-j JOBID, --jobid=JOBID

+

filter on jobid

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID

+

filter on transaction id

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-w "FIELD_INFO", --field-width=

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

-E JOB_END, --end=JOB_END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-S JOB_START, --start=JOB_START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+

-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE

+

transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

+

--at=TRANSACTION_AT_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+

--cbank-ref=CBANK_REF

+

filter on Clusterbank reference id

+

--created=JOB_CREATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--no-commas

+

remove commas from comma separated thousands

+

--no-header

+

do not display the header

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+

--queued=JOB_QUEUED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

+
    +
  • ge, gt, le, lt, eq or >=, >, <=, <, ==.
  • +
+

Operator Defaults:

+
    +
  • OPER1 is 'ge' for single date entry
  • +
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
  • +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-detail-users/index.html b/account-project-management/allocation-management/not_in_nav/sbank-detail-users/index.html new file mode 100644 index 0000000000..69d63daf91 --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-detail-users/index.html @@ -0,0 +1,6952 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-detail-users - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Manpage for sbank-detail-users

+

sbank-detail-users [options] [ ... ]

+

Detail user information.

+

**NOTE: ** + 1. Use -I to include inactive allocations + 2. the list of arguments are optional + 3. you can also enter list by using the -u option multiple times + 4. regardless, both are optional, and you can get detail user info using the option filters below

+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:], for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-w "FIELD_INFO", --field-width

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

-E END, --end=END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-I, --get-inactive

+

get inactive allocations

+

-S START, --start=START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--all-charges

+

only show list info that have charges regardless of project/user relationship

+

--no-commas

+

remove commas from comma separated thousands

+

--no-header

+

do not display the header

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-detail/index.html b/account-project-management/allocation-management/not_in_nav/sbank-detail/index.html new file mode 100644 index 0000000000..7af70f283c --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-detail/index.html @@ -0,0 +1,7135 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-detail - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Manpage for sbank-detail

+

sbank-detail [options]

+

Detail Meta Command

+

COMMANDS

+
    +
  • allocations [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-I|-O|-S|-T|...] (DEFAULT)
  • +
  • categories [-f|-n|-w|...]
  • +
  • messages [-f|-n|-w|...]
  • +
  • names [-f|-n|-w|...] jobs [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...]
  • +
  • projects [-a|-f|-n|-p|-r|-u|-w|-E|-H|-I|-S|...]
  • +
  • transactions [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...]
  • +
  • users [-a|-f|-n|-p|-r|-u|-w|-E|-H|-S|...]
  • +
+

OPTIONS

+

-a --allocation

+

enter allocation id

+

-c --comment

+

enter comment for new or edit commands, display comment for list commands

+

-e --event-id

+

enter event db id; event db id is an internal id created by the charging system

+

-f --field

+

enter [:], width is optional; enter -f? or -f "?" for available fields, + to add fields

+

-h --help

+

command line help

+

-j --jobid

+

enter jobid; jobid is created by the scheduler and is not unique

+

-n --num-field

+

enter number of fields to display

+

-p --project

+

enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-r --resource

+

enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-s --suballocation

+

enter suballocation id

+

-t --transaction

+

enter transaction id

+

-u --user

+

enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-w --field-width

+

enter the field width as follows: :, enter -w? or -w "?" for available fields

+

-E --end

+

enter end datetime filter

+

-H --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-I --get-inactive

+

include inactive allocations

+

-O --get-only-inactive

+

get only inactive allocations

+

-S --start

+

enter start datetime filter

+

-T --Type

+

enter type of transaction

+

--all-charges

+

for list allocations | projects | users, only show info with charges

+

--at

+

enter transaction created datetime filter

+

--award-category

+

enter allocation award category

+

--award-type-name

+

enter allocation award-type name

+

--created

+

enter created datetime filter

+

--debug

+

enter debug level

+

--get-deleted

+

get deleted objects

+

--get-not-charged

+

get jobs that have not been charged

+

--get-only-deleted

+

get only deleted objects

+

--history-date-range

+

enter history datetime filter

+

--last-updated

+

enter last updated datetime filter

+

--no-commas

+

remove commas from comma-separated thousands

+

--no-header

+

do not display header

+

--no-history

+

do not display history information

+

--no-rows

+

do not display rows

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display totals

+

--queued

+

enter queued datetime filter

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-examples/index.html b/account-project-management/allocation-management/not_in_nav/sbank-examples/index.html new file mode 100644 index 0000000000..32331a47c7 --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-examples/index.html @@ -0,0 +1,6940 @@ + + + + + + + + + + + + + + + + + + + + + sbank Example Commands - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

sbank Example Commands

+

Below is a set of helpful commands to help you better manage the projects you have running at the ALCF.

+

View your project's allocations

+

Command: sbank-list-allocations

+

Use this command to list all of your active allocations for a specific project [Project-X]. This is useful when you need to provide this information in a report. +

> sbank-list-allocations -p ProjectX -r all
+ Id         Start       End         Resource   Project          Jobs        Charged          Available Balance 
+ ---------  ----------  ----------  ---------  ---------------  ----------  ---------------  ----------------- 
+ 2106       2016-01-04  2017-01-01  cooley     ProjectX              1,139          6,032.8           43,967.2 
+ 2146       2016-01-14  2017-01-10  theta      ProjectX                983      1,084,770.3       25,483,927.5
+ 6438       2020-09-22  2022-01-01  thetagpu   ProjectX                  3              0.0            2,000.0 
+
+
+Totals:
+  Rows: 3
+  Cooley:
+    Available Balance: 43,967.2 node hours
+    Charged          : 6,032.8 node hours
+    Jobs             : 1,139 
+ Theta:
+    Available Balance: 25,483,927.5 node hours 
+    Charged          : 1,084,770.3 node hours 
+    Jobs             : 983 
+ Thetagpu:
+    Available Balance: 2,000.0 node hours
+    Charged          : 0.0 node hours
+    Jobs             : 3 
+

+

List your project's quota on Grand and/or Eagle File system

+
> sbank-list-allocations -p ProjectX -r grand
+ Allocation  Suballocation  Start       End         Resource  Project      Quota
+ ----------  -------------  ----------  ----------  --------  -----------  -----
+ 6687        6555           2020-12-16  2022-01-01  grand     ProjectX    1.0
+
+Totals:
+  Rows: 1
+  Grand:
+    Quota: 1.0 TB
+
+> sbank-list-allocations -p ProjectX -r eagle
+ Allocation  Suballocation  Start       End         Resource  Project      Quota
+ ----------  -------------  ----------  ----------  --------  -----------  -----
+ 6688        6556           2020-12-16  2022-01-01  eagle     ProjectX    1.0
+
+Totals:
+  Rows: 1
+  Eagle:
+    Quota: 1.0 TB
+
+

List only the created timestamp field for all allocations that were created before 01-01-2015 for ProjectX accross all resources

+
> sbank-list-allocations  --created "<20150101" -r all -p ProjectX "-f created"
+ Created    
+ ---------- 
+ 2016-01-04 
+ 2016-01-14 
+ 2016-01-15 
+
+Totals:
+  Rows: 3
+Date  filters (UTC): created < "2015-01-01 00:00:00",  
+
+

List all active allocations for all resources for project ProjectX and add the field Created to the display list

+
shrubbery~ > sbank-list-allocations -r all  -p ProjectX -f "+created"
+ Id         Start       End         Resource   Project          Jobs        Charged          Available Balance  Created    
+ ---------  ----------  ----------  ---------  ---------------  ----------  ---------------  -----------------  ---------- 
+ 279        2011-08-30  2020-01-01  theta      ProjectX              6,361     12,332,699.9      -12,332,699.9  2013-02-22 
+ 2106       2016-01-04  2017-01-01  cooley     ProjectX              1,150          6,080.9           43,919.1  2016-01-04  
+
+Totals:
+  Rows: 2
+  Theta:
+    Available Balance: -12,332,699.9 node hours
+    Charged          : 12,332,699.9 node hours
+    Jobs             : 6,361 
+  Cooley:
+    Available Balance: 43,919.1 node hours
+    Charged          : 6,080.9 node hours
+    Jobs             : 1,150 
+
+

List all available fields for the sbank-list-allocations command

+
> sbank-list-allocations  -f "?"
+available fields:
+ id
+ start_timestamp
+ end_timestamp
+ resource
+ project_name
+ jobs_count
+ charged_sum
+ available_balance_sum
+ created_timestamp
+ award_category
+ award_type_name
+ admin_name
+ cbank_ref
+ comment
+
+

View your project's users

+

Command: sbank-list-users

+

List all charges for userx on theta on project ProjectX +

> sbank-list-users -p ProjectX -r theta -u userx
+ User             Jobs        Charged         
+ ---------------  ----------  --------------- 
+ userx                 1,814          9,884.5
+
+Totals:
+  Rows: 1
+  Resources: theta
+  Charged: 9,884.5 node hours
+  Jobs   : 1,814 
+  ```
+
+### List charges for all users in ProjectX on Cooley.
+This works for project leads (i.e. PIs, Co-PIs, Proxies), since they can see everything in their own projects.
+

+
+

sbank-list-users -p ProjectX -r theta + User Jobs Charged

+
+
+

user1 120 4,243.7 + user2 0 0.0 + user3 0 0.0 + user4 181 1,195.5 + user5 0 0.0 + user6 2,560 10,868.7 + user7 0 0.0 + user8 0 0.0 + user9 0 0.0 + user10 7 3.5 + user11 0 0.0

+

Totals: + Rows: 11 + Resources: theta + Charged: 16,311.4 node hours + Jobs : 2,868

+

## View your project's jobs
+List jobs for user "userx" for jobs that started in the range 2016-02-15<= started < 2016-02-29 and add the transactions related to the job
+
+### **Command:** sbank-list-jobs
+
+**Note:** The job with the refund ```transaction_ids_list field can be shorten all the way to "t" in the -f "+ t"```
+
+shrubbery~ > sbank-list-jobs -u userx -f "+ t" -S "2016-02-15...2016-02-29" + Id Jobid Resource Project Allocation User Duration Charged Transaction Ids

+
+

1013857 730417 theta ProjectX 1740 userx 1:53:07 61,776.8 CHARGE-1011230
+ 1013860 730558 theta ProjectX 1740 userx 1:53:07 61,776.8 CHARGE-1011233
+ 1014168 730668 theta ProjectX 1740 userx 1:53:25 61,940.6 CHARGE-1011541

+

Totals: + Rows: 3 + Theta: + Charged : 185,494.2 node hours + Duration : 6:44:00 +Date filters (UTC): "2016-02-15 00:00:00" <= start < "2016-02-29 00:00:00", +

### List the nodes used, runtime and start timestamp for Cooley job 744160
+**Note**: To display the date and time we increased the the number of characters of start_timestamp to 19
+
+catapult~ > sbank l j -r theta -j 50576 -f "jobid nodes_used runtime start_timestamp:19" Jobid Nodes Used Runtime Start --------- ---------- --------- ------------------- 50576 512 1:00:49 2013-01-16 21:49:30 Totals: Rows: 1 +
## View your project's transactions
+### **Command:** sbank-list-transactions
+
+List of transactions that where at or after 2016-02-29 for ProjectX add fields: job_duration, nodes_used and hosts
+
+**Note**: 
+- job_duration, nodes_used and hosts are shorten, but they are still uniquely identified
+- host has the left justified width of 20, specified as "h:-20"
+
+catapult~ > sbank-list-transactions -p ProjectX --at "ge 2016-02-29" -f "+ job_d nodes_u h:-20" -r theta + Id Resource Project Allocation At User Transaction Type Amount Jobid Job Duration Nodes Used Hosts

+
+

1025426 theta ProjectX 2147 2016-02-29 userx CHARGE 48,005.1 740587 1:27:54 2048 MIR-00800-33BF1-2048 + 1028046 theta ProjectX 2147 2016-03-01 userx CHARGE 147,647.1 742090 4:30:21 2048 MIR-40000-733F1-2048 + 1028755 theta ProjectX 2147 2016-03-02 userx CHARGE 1,576,068.0 742126 6:00:44 16384 MIR-04000-77FF1-1638

+

Totals: + Rows: 3 + Theta: + Charges Amount: 1,771,720.2 node hours + Job Duration : 11:58:98 +Date filters (UTC) : at >= "2016-02-29 00:00:00",
+```

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-list-allocations/index.html b/account-project-management/allocation-management/not_in_nav/sbank-list-allocations/index.html new file mode 100644 index 0000000000..dd148149b1 --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-list-allocations/index.html @@ -0,0 +1,7111 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-list-allocations - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Manpage for sbank-list-allocations

+

sbank-list-allocations [options]

+

Generate allocation list report.

+

Notes: +1. Use -I to include inactive allocations +2. enter "-r all" to get information for all resources

+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-c, --comment

+

display comment

+

-e EVENT_ID, --event-id=EVENT_ID

+

filter on event id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:], for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-j JOBID, --jobid=JOBID

+

filter on jobid

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID

+

filter on transaction id

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-w "FIELD_INFO", --field-width

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

-E END, --end=END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-I, --get-inactive

+

get inactive allocations

+

-O, --get-only-inactive

+

only inactive allocations

+

-S START, --start=START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE

+

transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

+

--award-type-name=AWARD_TYPE_NAME

+

filter on award-type name

+

--award-category=AWARD_CATEGORY

+

filter on award category

+

--cbank-ref=CBANK_REF

+

filter on Clusterbank reference id

+

--created=CREATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--get-deleted

+

get deleted objects

+

--get-only-deleted

+

get only deleted objects

+

--all-charges

+

only show list info that have charges regardless of project/user relationship

+

--last-updated=LAST_UPDATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--no-commas

+

remove commas from comma-separated thousands

+

--no-header

+

do not display the header

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-list-jobs/index.html b/account-project-management/allocation-management/not_in_nav/sbank-list-jobs/index.html new file mode 100644 index 0000000000..1e3dbe0e0a --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-list-jobs/index.html @@ -0,0 +1,7060 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-list-jobs - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Manpage for sbank-list-jobs

+

sbank-list-jobs [options]

+

Generate job list report Note: To get information for all resources, enter "-r all".

+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-e EVENT_ID, --event-id=EVENT_ID

+

filter on event id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:], for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-j JOBID, --jobid=JOBID

+

filter on jobid

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID

+

filter on transaction id

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-w "FIELD_INFO", --field-width

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

-E END, --end=END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-S START, --start=START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE

+

transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

+

--created=CREATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--eligible=ELIGIBLE_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--get-not-charged

+

get only jobs that have not been charged

+

--last-updated=LAST_UPDATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--no-commas

+

remove commas from comma-separated thousands

+

--no-header

+

do not display the header

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+

--queued=QUEUED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-list-projects/index.html b/account-project-management/allocation-management/not_in_nav/sbank-list-projects/index.html new file mode 100644 index 0000000000..03a8d1c150 --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-list-projects/index.html @@ -0,0 +1,6946 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-list-projects - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Manpage for sbank-list-projects

+

sbank-list-projects [options]

+

Generate project list report.

+

Notes:

+
    +
  1. Use -I to include inactive allocations
  2. +
  3. +
      +
    1. to get information for all resources, enter "-r all"
    2. +
    +
  4. +
+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:], for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-w "FIELD_INFO", --field-width

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-I, --get-inactive

+

get inactive allocations

+

-S START, --start=START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--all-charges

+

only show list info that have charges regardless of project/user relationship

+

--no-commas

+

remove commas from comma-separated thousands

+

--no-header

+

do not display the header

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-list-transactions/index.html b/account-project-management/allocation-management/not_in_nav/sbank-list-transactions/index.html new file mode 100644 index 0000000000..9cddc98c6a --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-list-transactions/index.html @@ -0,0 +1,7052 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-list-transactions - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Manpage for sbank-list-transactions

+

sbank-list-transactions [options]

+

Generate transaction list report.

+

Note: To get information for all resources, enter "-r all".

+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-c, --comment

+

display comment

+

-e EVENT_ID, --event-id=EVENT_ID

+

filter on event id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:], for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-j JOBID, --jobid=JOBID

+

filter on jobid

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID

+

filter on transaction id

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

+

-w "FIELD_INFO", --field-width

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

-E JOB_END, --end=JOB_END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-S JOB_START, --start=JOB_START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE

+

transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

+

--at=TRANSACTION_AT_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--cbank-ref=CBANK_REF

+

filter on Clusterbank reference id

+

--created=JOB_CREATED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--no-commas

+

remove commas from comma-separated thousands

+

--no-header

+

do not display the header

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+

--queued=JOB_QUEUED_TIMESTAMP

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-list-users/index.html b/account-project-management/allocation-management/not_in_nav/sbank-list-users/index.html new file mode 100644 index 0000000000..c68379e844 --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-list-users/index.html @@ -0,0 +1,6956 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-list-users - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Manpage for sbank-list-users

+

sbank-list-users [options]

+

Generate user list report.

+

Notes:

+
    +
  1. Use -I to include inactive allocations
  2. +
  3. +
      +
    1. for information for all resources, use "-r all"
    2. +
    +
  4. +
+

OPTIONS

+

--version

+

show program's version number and exit

+

-h, --help

+

show this help message and exit

+

-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

+

filter on allocation id

+

-f FIELD_INFO, --field-to-display=FIELD_INFO

+

FIELD_INFO is [:], for available fields enter -f? or -f "?", to add fields enter -f "+ [:] ..."

+

-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

+

set number of fields to display

+

-p PROJECT, --project=PROJECT

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-r RESOURCE, --resource=RESOURCE

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-u USER, --user=USER

+

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-w "FIELD_INFO", --field-width

+

"FIELD_INFO" FIELD_INFO is :, for available fields enter -w? or -w "?"

+

-E END, --end=END

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

-H, --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-I, --get-inactive

+

also get inactive allocations

+

-S START, --start=START

+

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: + - ge, gt, le, lt, eq or >=, >, <=, <, ==.

+

Operator Defaults:

+
    +
  • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.
  • +
+

Date Parsing Precedence:

+
    +
  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
  • +
+

--debug=DEBUG_LEVEL

+

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

+

--all-charges

+

only show list info that have charges regardless of project/user relationship

+

--no-commas

+

remove commas from comma-separated thousands

+

--no-header

+

do not display the header

+

--no-rows

+

do not display the row data

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display the totals

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-list/index.html b/account-project-management/allocation-management/not_in_nav/sbank-list/index.html new file mode 100644 index 0000000000..9d3b293e6e --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-list/index.html @@ -0,0 +1,7134 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank-list - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Manpage for sbank-list

+

sbank-list [options]

+

List Meta Command

+

COMMANDS

+
    +
  • allocations [-a|-c|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-I|-O|-S|-T|...] (DEFAULT)
  • +
  • categories [-f|-n|-w|...] messages [-f|-n|-w|...] names [-f|-n|-w|...]
  • +
  • jobs [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...]
  • +
  • projects [-a|-f|-n|-p|-r|-u|-w|-E|-H|-I|-S|...]
  • +
  • transactions [-a|-c|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...]
  • +
  • users [-a|-f|-n|-p|-r|-u|-w|-E|-H|-S|...]
  • +
+

OPTIONS

+

-a --allocation

+

enter allocation id

+

-c --comment

+

enter comment for new or edit commands, display comment for list commands

+

-e --event-id

+

enter event db id; event db id is an internal id created by the charging system

+

-f --field

+

enter [:], width is optional; enter -f? or -f "?" for available fields, + to add fields

+

-h --help

+

command line help

+

-j --jobid

+

enter jobid; jobid is created by the scheduler and is not unique

+

-n --num-field

+

enter number of fields to display

+

-p --project

+

enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-r --resource

+

enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-s --suballocation

+

enter suballocation id

+

-t --transaction

+

enter transaction id

+

-u --user

+

enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-w --field-width

+

enter the field width as follows: :, enter -w? or -w "?" for available fields

+

-E --end

+

enter end datetime filter

+

-H --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-I --get-inactive

+

include inactive allocations

+

-O --get-only-inactive

+

include inactive allocations

+

-S --start

+

enter start datetime filter

+

-T --Type

+

enter type of transaction

+

--all-charges

+

for list allocations | projects | users, only show info with charges

+

--at

+

enter transaction-created datetime filter

+

--award-category

+

enter allocation award category

+

--award-type-name

+

enter allocation award-type name

+

--created

+

enter created datetime filter

+

--debug

+

enter debug level

+

--get-deleted

+

get deleted objects

+

--get-not-charged

+

get jobs that have not been charged

+

--get-only-deleted

+

get only deleted objects

+

--history-date-range

+

enter history datetime filter

+

--last-updated

+

enter last updated datetime filter

+

--no-commas

+

remove commas from comma-separated thousands

+

--no-header

+

do not display header

+

--no-history

+

do not display history information

+

--no-rows

+

do not display rows

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display totals

+

--queued

+

enter queued datetime filter

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/not_in_nav/sbank-manpage/index.html b/account-project-management/allocation-management/not_in_nav/sbank-manpage/index.html new file mode 100644 index 0000000000..7b3fb17717 --- /dev/null +++ b/account-project-management/allocation-management/not_in_nav/sbank-manpage/index.html @@ -0,0 +1,7511 @@ + + + + + + + + + + + + + + + + + + + + + Manpage for sbank Commands - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Manpage for sbank Commands

+

sbank [options]

+

DESCRIPTION

+

HPC Accounting System Command Line Interface

+

detail meta command

+

"detail" meta command displays information in a long format with history updates, where appropriate.

+

list meta command

+

"list" meta command displays information in a table format, but no history updates are displayed.

+

IMPORTANT NOTES + 1. All dates entered shall be interpreted as UTC + 2. non-admin users will only be able to see their content (jobs, charges, etc.) + 3. project admin users will be able to see all of the content for their projects + 4. staff admin users will be able to see all the content + 5. --help and -h are the help options.

+

META COMMANDS

+

- detail [options]

+

- list [options] (DEFAULT)

+

DETAIL COMMANDS + * allocations [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-I|-O|-S|-T|...] [ ... ] (DEFAULT) + * jobs [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...] [ ... ] + * projects [-a|-f|-n|-p|-r|-u|-w|-E|-H|-I|-S|...] [ ... ] + * transactions [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...] [ ... ] + * users [-a|-f|-n|-p|-r|-u|-w|-E|-H|-S|...] [ ... ]

+

LIST COMMANDS + * allocations [-a|-c|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-I|-O|-S|-T|...] (DEFAULT) + * jobs [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...] projects [-a|-f|-n|-p|-r|-u|-w|-E|-H|-I|-S|...] + * transactions [-a|-c|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...] + * users [-a|-f|-n|-p|-r|-u|-w|-E|-H|-S|...]

+

OPTIONS

+

-a --allocation

+

enter allocation id

+

-c --comment

+

enter comment for new or edit commands, display comment for list commands

+

-e --event-id

+

enter event db id; event db id is an internal id created by the charging system

+

-f --field

+

enter [:], width is optional; enter -f? or -f "?" for available fields, + to add fields

+

-h --help

+

command line help

+

-j --jobid

+

enter jobid; jobid is created by the scheduler and is not unique

+

-n --num-field

+

enter number of fields to display

+

-p --project

+

enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-r --resource

+

enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-s --suballocation

+

enter suballocation id

+

-t --transaction

+

enter transaction id

+

-u --user

+

enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

+

-w --field-width

+

enter the field width as follows: :, enter -w? or -w "?" for available fields

+

-E --end

+

enter end datetime filter

+

-H --human-readable

+

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

+

-I --get-inactive

+

include inactive allocations

+

-O --get-only-inactive

+

get only inactive allocations

+

-S --start

+

enter start datetime filter

+

-T --Type

+

enter type of transaction

+

--all-charges

+

for list allocations | projects | users, only show info with charges

+

--at

+

enter transaction-created datetime filter

+

--award-category

+

enter allocation award category

+

--award-type-name

+

enter allocation award-type name

+

--created

+

enter created datetime filter

+

--debug

+

enter debug level

+

--get-deleted

+

get deleted objects

+

--get-not-charged

+

get jobs that have not been charged

+

--get-only-deleted

+

get only deleted objects

+

--history-date-range

+

enter history datetime filter

+

--home-dir

+

enter the directory to store the pbs meta file

+

--ignore-pbs-files

+

all new pbs files will be ignored and marked as processed

+

--last-updated

+

enter last updated datetime filter

+

--no-commas

+

remove commas from comma-separated thousands

+

--no-header

+

do not display header

+

--no-history

+

do not display history information

+

--no-rows

+

do not display rows

+

--no-sys-msg

+

do not display system message

+

--no-totals

+

do not display totals

+

--queued

+

enter queued datetime filter

+

MORE OPTION EXPLANATIONS

+

For -a, -e, -f, -w, -j, -p, -r, -t, -u, -T, --award-categories, --award_type_names, --cbank_refs options:

+

These options can be entered multiple times for different values or entered once for multiple values.

+

Examples:

+
    +
  1. +
    +

    sbank-list-allocation -u "pershey rojas allcock" or > sbank-list-allocation -u pershey -u rojas -u allcock

    +
    +
  2. +
  3. +
    +

    sbank-list-allocation -f "id p avail" or > sbank-list-allocation -f id -f p -f avail For -u, -p and -r the use of wild card "*" is allowed, but only on names, not ids:

    +
    +
  4. +
+

Examples:

+
    +
  1. The following command will find allocations for users whose names start with "pers" and also users rojas and allcock. > sbank-list-allocation -u "pers* rojas allcock"
  2. +
  3. The following command will find allocations for projects that contain "ratio" in the name. > sbank-list-allocation -p ratio
  4. +
  5. The following command will find allocations for projects that end with "tion" in the name. > sbank-list-allocation -p *tion
  6. +
  7. The following command will find allocations for projects that start with "ab" and end with "ng" in the name. > sbank-list-allocation -p ab*ng
  8. +
+

For -f option: +This option is the display field option.

+

To get the available fields enter -f? or -f "?". Default fields columns will be displayed if no field option is specified.

+

To replace the current fields to display, enter: +

> sbank-list-allocations ... -f "FIELD[:WIDTH]...FIELD[:WIDTH]" or > sbank-list-allocations ... -f FIELD[:WIDTH] ... -f FIELD[:WIDTH] 
+

+

If you wish to add fields to the default fields, enter one + symbol anywhere in the quoted string: +

> sbank-list-allocations ... -f "+ FIELD[:WIDTH]...FIELD[:WIDTH]", only one + symbol is needed.
+

+

The fields will be displayed in table format and in the order entered in the command line. You can specify the field width, where WIDTH can be positive or negative value. Left alignment use -, right alignment use + or nothing.

+

For -w option:

+

FIELD:WIDTH, if the field is displayed it will change the width for the specified field.

+

NOTE: This will not add the field as in -f option, only change the width. To get available fields you can also use -w? or -w "?" as in -f option.

+

For -S, -E, --created, --queued, --last-updated, --history-date-range options:

+

These are the date filter options. All dates are treated as UTC.

+

You can use any reasonable date string that resembles a date Ambiguous dates will be parsed with the following parsing precedence: **YEAR then MONTH then DAY **

+

For example, 10-11-12 or 101112 will be the following date: Oct. 11, 2012 Not: Nov. 12, 2010 or Nov. 10, 2012

+

Or you can specify a single date as follows: +

"[OPER]UTC_DATE" You can specify a date range as follows: 
+"[OPER1]UTC_DATE1...[OPER2]UTC_DATE2" Where OPER can be one of the following operators: "==", ">=", "<=", ">", "<" or "eq", "ge", "le", "gt", "lt" 
+

+

Note: The following defaults for OPER, OPER1, OPER2 for the following options: +

Options OPER OPER1 OPER2 ------------------------- ---- ----- ----- -E, < >= < -S, >= >= < --at >= >= < --created >= >= < --eligible >= >= < --last-updated >= >= < --queued >= >= < 
+

+

You can also use the following key letters "n", "t", "d", "w", "y" as follows: +

KEY SYNTAX DEFINITIONS ---------- ----------- n[ow] now, where "now" is current-date current-time UTC t[oday] today, where "today" is current-date 00:00:00 UTC [+/-]d specified "number" of +/- days from "today" in UTC [+/-]w specified "number" of +/- weeks from "today" in UTC [+/-]y specified "number" of +/- years from "today" in UTC
+

+

For -T option:

+

Transaction type option. The following are the valid transaction types and their explanation: CHARGE filter on job charges PULLBACK filter on allocation pullbacks DEPOSIT filter on allocation deposits REFUND filter on job refunds VOID filter on void transactions

+

INVOCATION

+

sbank sbank sbank sbank-detail sbank detail sbank d sbank-detail-allocations sbank detail allocations sbank d a sbank-detail-jobs sbank detail jobs sbank d j sbank-detail-projects sbank detail project sbank d p sbank-detail-transactions sbank detail transactions sbank d t sbank-detail-users sbank detail users sbank d u sbank-list sbank list sbank l sbank-list-allocations sbank list allocations sbank l a sbank-list-jobs sbank list jobs sbank l j sbank-list-projects sbank list projects sbank l p sbank-list-transactions sbank list transactions sbank l t sbank-list-users sbank list users sbank l u

+

ENVIRONMENT VARIABLES

+

Command line default options: Define the following environment variables as you would in the command line. Once the environment variable is defined, it will be used as the default options and arguments for the specific command. Command line options will take precedence.

+

sbank_DETAIL_ALLOCATIONS_ARGS

+

Default arguments and options for sbank-detail-allocations.

+

sbank_DETAIL_CATEGORIES_ARGS

+

Default arguments and options for sbank-detail-categories.

+

sbank_DETAIL_NAMES_ARGS

+

Default arguments and options for sbank-detail-names.

+

sbank_DETAIL_MESSAGES_ARGS

+

Default arguments and options for sbank-detail-messages.

+

sbank_DETAIL_JOBS_ARGS

+

Default arguments and options for sbank-detail-jobs.

+

sbank_DETAIL_PROJECTS_ARGS

+

Default arguments and options for sbank-detail-projects.

+

sbank_DETAIL_TRANSACTIONS_ARGS

+

Default arguments and options for sbank-detail-transactions.

+

sbank_DETAIL_USERS_ARGS

+

Default arguments and options for sbank-detail-users.

+

sbank_LIST_ALLOCATIONS_ARGS

+

Default arguments and options for sbank-list-allocations.

+

sbank_LIST_JOBS_ARGS

+

Default arguments and options for sbank-list-jobs.

+

sbank_LIST_PROJECTS_ARGS

+

Default arguments and options for sbank-list-projects.

+

sbank_LIST_TRANSACTIONS_ARGS

+

Default arguments and options for sbank-list-transactions.

+

sbank_LIST_USERS_ARGS

+

Default arguments and options for sbank-list-users.

+

EXAMPLES

+

Example 1: -f, --field +

> sbank-list-transactions ... -f field1:-20 -f field2:20 -f field3 or > sbank-list-transactions ... -f "field1:-20 field2:20 field3" 
+
+Explanation: Fields will be displayed in order of appearance, where field1:-20 means 20 characters long, left align; where field2:20 means 20 characters long, right align; where field3 uses default sizes. Number fields default to right aligned. Text fields default to left aligned.

+

Example 2: -S, -E, --created, --queued, --last-updated, --history-start, --history-end

+

Single date-string examples:

+
    +
  • +
    +

    sbank-list-allocations -S ">=Oct 11, 2014" start dates that are >= "2014-10-11 00:00:00"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -S "<=2014-11-10" start dates that are <= "2014-11-10 00:00:00"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -E "<20141110" end dates that are < "2014-11-10 00:00:00"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -E "22:30:10" end dates that are < " 22:30:10"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -S ">today" start dates that are > " 00:00:00"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -E t end dates that are < " 00:00:00"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -S gtnow start dates that are > " "

    +
    +
  • +
  • +
    +

    sbank-list-allocations -E len end dates that are <= " "

    +
    +
  • +
  • +
    +

    sbank-list-allocations -S "1d" start dates that are >= "today +1 day"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -E "-2w" end dates that are < "today -2 weeks"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -S ">=1y" start dates that are >= "today +1 year"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -S ">2012" start dates that are > "2012-- 00:00:00"

    +
    +
  • +
+

Range date-string examples:

+
    +
  • +
    +

    sbank-list-allocations -S "2013-01-01...2014-01-01" "2013-01-01" <= DATES < "2014-01-01"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -S "-1y...t" "today -1 year" <= DATES < "today"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -E "2013...t"" "2013--" <= DATES < "today"

    +
    +
  • +
  • +
    +

    sbank-list-allocations -E ">2013...<=t"" "2013--" < DATES <= "today"

    +
    +
  • +
+

Example 3: Command invocation examples

+
    +
  • +
    +

    sbank-list-projects list projects full command invocation

    +
    +
  • +
  • +
    +

    sbank list projects list projects meta command invocation

    +
    +
  • +
  • +
    +

    sbank s p list projects partial meta command invocation

    +
    +
  • +
  • +
    +

    sbank p list projects where "list" is the default

    +
    +
  • +
  • +
    +

    sbank list allocations is the default

    +
    +
  • +
  • +
    +

    sbank a list allocations "list" is the default

    +
    +
  • +
  • +
    +

    sbank s a list allocations partial meta command invocation

    +
    +
  • +
+

Example 4: -h, --help

+
    +
  • +
    +

    sbank -h will give you help summary on all of sbank

    +
    +
  • +
  • +
    +

    sbank list --help will give you help on all the "list" commands

    +
    +
  • +
  • +
    +

    sbank list allocations -h will give you help on the "list allocations" command

    +
    +
  • +
  • +
    +

    sbank-list-allocations -h will give you help on the "list allocations" command

    +
    +
  • +
  • +
    +

    sbank l a --help will give you help on the "list allocations" command

    +
    +
  • +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/overview/index.html b/account-project-management/allocation-management/overview/index.html new file mode 100644 index 0000000000..d7219d3a0e --- /dev/null +++ b/account-project-management/allocation-management/overview/index.html @@ -0,0 +1,6889 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Allocations on ALCF Computing Resources - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Allocations on ALCF Computing Resources

+

Getting an Allocation Award

+

INCITE, ALCC, and ADSP

+

Researchers gain access to ALCF systems for computational science and engineering projects—typically with awards of millions of core-hours—through competitive, peer-reviewed allocation programs supported by the DOE and Argonne. Our peer-reviewed award programs consist of the INCITE, ALCC, and ADSP programs. More information about the programs, including dates for our CFPs, can be found on their web pages.

+

Director's Discretionary

+

Alternatively, ALCF offers a Director's Discretionary allocation award program to leadership computing preparation, INCITE and ALCC scaling, and application performance to maximize scientific application efficiency and productivity on leadership computing platforms. See the Director's Discretionary (DD) Program page for more information.

+

Initializing Your Awarded Allocation

+

Projects with INCITE, ALCC, and ADSP awards will be contacted directly by the ALCF staff with information on creating accounts.

+

Director's Discretionary awards will receive information in the award confirmation email.

+

Allocation Resources

+

While requesting an allocation, users can choose from:

+

Computes: +- Polaris +- Theta (KNL Node) +- ThetaGPU (GPU Node) +- Cooley

+

File System: +- Grand +- Eagle (Community Sharing)

+ +

Pullback Policy

+

Requesting Additional Allocation Hours

+

If you are a PI of a Director's Discretionary project that has an active allocation, you can request additional time or an extension using the allocation request form.

+
+

Project Management +

+
To request more hours, renew your project using the allocation request form.
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/allocation-management/sbank-allocation-accounting-system/index.html b/account-project-management/allocation-management/sbank-allocation-accounting-system/index.html new file mode 100644 index 0000000000..c98c755de3 --- /dev/null +++ b/account-project-management/allocation-management/sbank-allocation-accounting-system/index.html @@ -0,0 +1,6781 @@ + + + + + + + + + + + + + + + + + + + + + + + + + sbank Allocation Accounting System - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

sbank Allocation Accounting System

+

sbank is the accounting system used within the ALCF. It tracks project allocations, usage charges, and refunds. sbank allows queries about the balance and expiration of project allocations, and has replaced the outdated cbank accounting system.

+

The sbank accounting system helps users manage their allocations and usage per job. It gives the PIs the ability to monitor their allocation usage by user, job, and machine. It also allows the user to monitor their usage per allocation and provides insight on how many hours are left on the project.

+

Getting Started with sbank

+

sbank Example Commands provides a set of example commands on how to use the most common commands.

+

sbank Man Pages

+

Use these sbank man pages to get information on how to use the commands.

+ + + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/project-management/files/PINAME_ALLOCATION_YEARS_EOP.docx b/account-project-management/project-management/files/PINAME_ALLOCATION_YEARS_EOP.docx new file mode 100644 index 0000000000..7d2a245fff Binary files /dev/null and b/account-project-management/project-management/files/PINAME_ALLOCATION_YEARS_EOP.docx differ diff --git a/account-project-management/project-management/files/PINAME_ALLOCATION_YEAR_EOY.docx b/account-project-management/project-management/files/PINAME_ALLOCATION_YEAR_EOY.docx new file mode 100644 index 0000000000..0d4647a3f8 Binary files /dev/null and b/account-project-management/project-management/files/PINAME_ALLOCATION_YEAR_EOY.docx differ diff --git a/account-project-management/project-management/files/PINAME_ALLOCATION_YEAR_QX.docx b/account-project-management/project-management/files/PINAME_ALLOCATION_YEAR_QX.docx new file mode 100644 index 0000000000..be58d6f7e7 Binary files /dev/null and b/account-project-management/project-management/files/PINAME_ALLOCATION_YEAR_QX.docx differ diff --git a/account-project-management/project-management/files/accounts-app.png b/account-project-management/project-management/files/accounts-app.png new file mode 100644 index 0000000000..b52e70efc0 Binary files /dev/null and b/account-project-management/project-management/files/accounts-app.png differ diff --git a/account-project-management/project-management/files/project-management-1.png b/account-project-management/project-management/files/project-management-1.png new file mode 100644 index 0000000000..6afe754647 Binary files /dev/null and b/account-project-management/project-management/files/project-management-1.png differ diff --git a/account-project-management/project-management/files/project-management-2.png b/account-project-management/project-management/files/project-management-2.png new file mode 100644 index 0000000000..b969601cf8 Binary files /dev/null and b/account-project-management/project-management/files/project-management-2.png differ diff --git a/account-project-management/project-management/files/project-management-3.png b/account-project-management/project-management/files/project-management-3.png new file mode 100644 index 0000000000..812e1c93b4 Binary files /dev/null and b/account-project-management/project-management/files/project-management-3.png differ diff --git a/account-project-management/project-management/files/project-management-4.png b/account-project-management/project-management/files/project-management-4.png new file mode 100644 index 0000000000..4c43550e4c Binary files /dev/null and b/account-project-management/project-management/files/project-management-4.png differ diff --git a/account-project-management/project-management/project-reports/index.html b/account-project-management/project-management/project-reports/index.html new file mode 100644 index 0000000000..0c117cbf18 --- /dev/null +++ b/account-project-management/project-management/project-reports/index.html @@ -0,0 +1,6945 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Reporting - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Quarterly and Year-End Reporting

+

The Argonne Leadership Computing Facility (ALCF) is required to report the progress and scientific accomplishments of all peer-reviwed projects.

+

PIs of INCITE, ALCC, and ADSP projects are required to complete quarterly reports and a final end-of-project (EOY/EOP) report.

+

Due dates

+

Due dates for the 2024 INCITE quarterly, EOY, and the EOP reports:

+
    +
  • April 1, 2024 (CY2024 - Q1)
  • +
  • July 1, 2024 (CY2024 - Q2)
  • +
  • October 1, 2024 (CY2024 - Q3)
  • +
  • January 1, 2025 (CY2025 - EOY) or February 15, 2025 (entire allocation period - EOP)
  • +
+

Due dates for the 2023-2024 ALCC quarterly and the EOP reports:

+
    +
  • October 1, 2023 (CY2023 - Q3)
  • +
  • January 1, 2024 (CY2024 - Q4)
  • +
  • April 1, 2024 (CY2024 - Q1)
  • +
  • August 15, 2024 (CY2024 - EOP)
  • +
+

Penalties

+

If a quarterly report is more than 30 days late: +- The ability to submit jobs for the PI and users of the late project will be disabled.

+

If a quarterly report is more than 90 days late: +- The PI and users of the late project will have their accounts disabled.

+

These penalties will be removed within three business days after the late quarterly or EOY report is submitted.

+

ALCC Specific Penalties:

+

A similar penalty will also be applied to new ALCC projects with the same PI or co-PIs that have failed to submit the EOP report for a previous ALCC project. If the EOP report is more than 15 days late:

+
    +
  • The new ALCC project will be blocked. For a currently active ALCC project, the ability to submit jobs will be disabled for the project and all sub-projects. For a project that has not been created yet, the process for new project creation will be halted.
  • +
+

Appeals

+

A PI or user may appeal a project or account suspension to the ALCF Director by a request to support at alcf.anl.gov.

+

Report Templates

+

Templates for the quarterly and the EOY reports can be found at the links on the bottom of this page.

+

Please modify the filename to replace PINAME with the last name of the PI of the INCITE/ALCC project, ALLOCATION to INCITE/ALCC, and YEAR to the corresponding calendar year. For quarterly reports, please replace the X in the filename with the quarter number.

+

For example, for a project with PI 'Joe Smith' that is submitting the quarterly report for the first quarter in 2023-2024 cycle for ALCC, the filename will be Smith_ALCC_Q1.docx.

+

For an EOY report, replace YEARS with the years associated with your allocation. For example, an ALCC 2023-2024 project with PI 'Joe Smith' would have a filename of Smith_ALCC_2023-2024_EOY.docx.

+

Templates for INCITE and ALCC:

+ + + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/project-management/starting-alcf-award/index.html b/account-project-management/project-management/starting-alcf-award/index.html new file mode 100644 index 0000000000..d0ee043440 --- /dev/null +++ b/account-project-management/project-management/starting-alcf-award/index.html @@ -0,0 +1,7246 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Starting Your ALCF Award - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Starting Your ALCF Award

+

The following guide is for PIs and Proxies to get insight into managing projects and teams for ALCF awards. Please submit for questions or trouble tickets to support@alcf.anl.gov.

+

Get Started with ALCF’s Systems

+

To get started using our resources, please visit: +Connect & Login

+

We also encourage you to take full advantage of ALCF's training programs and user services. Some useful introductory materials and videos are listed below:

+ +

Project Terminology

+

Before your project begins, you will receive an email with the following project information:

+
    +
  • Project Short Name: The assigned, shortened name for your project. This will be the name that you’ll use to access your project on the systems.
  • +
  • Project Proxies: Project members designated by PIs that are authorized to add or renew project members on your behalf.
  • +
  • Allocation System(s) and Allocation Amount: The approved system(s) and amount of your award in node hours.
  • +
  • Approved Quota: The approved amount of disk space for your project directory.
  • +
  • File System: The file system where your project directory will reside. For information on the Grand and Eagle file systems, see Storage and Networking.
  • +
  • Assigned Catalyst: INCITE projects will have ALCF staff members that are assigned to the projects who are available to assist the team throughout the duration of the INCITE allocation.
  • +
  • Allocation Start Date: The start date of your award.
  • +
  • Allocation End Date: The end date of your award.
  • +
+

Account Setup

+

If you do not have an ALCF account: You will need to request one at https://accounts.alcf.anl.gov/accountRequest. When prompted for project name, please select the project short name you were given in your award email from support@alcf.anl.gov.

+

If you have an active ALCF account: Submit a request to join the newly awarded project at https://accounts.alcf.anl.gov/#!/joinProject.

+

Information for Foreign National Access

+

The U.S. Department of Energy has guidelines and requirements for foreign nationals who access its facilities and sites. This guidance is issued in DOE Order 142.3, which is part of Argonne's contract; therefore, all foreign nationals (non-U.S. Citizens) must obtain authorization prior to using ALCF resources.

+

If you are a foreign national and do not have current authorization credentials, you are required to submit a ANL-593 (Foreign National Access Request) form. It is critical that identity documentation requests sent by ALCF staff are completed as early as possible to facilitate timely processing for your account approval.

+

User Agreement for INCITE, ALCC, and ADSP

+

Note: This does not apply to Director's Discretionary awards.

+

Insitution Master Agreement for INCITE, ALCC, and ADSP

+

If you are not an employee of Argonne National Laboratory, a user agreement must be signed by your home institution to perform research at Argonne’s user facilities. This policy applies to every member of the project team who will be conducting research on ALCF resources.

+

A list of home institutions that have master agreements in place is located on this webpage: https://www.aps.anl.gov/Users-Information/Legal-Financial/Argonne-User-Facility-Agreements

+

ALCF User Agreement for INCITE, ALCC, and ADSP

+

Note: This does not apply to Director's Discretionary awards.

+

Every project team member who requests an ALCF account must sign and return an acknowledgment form, stating that they agree to the terms in the user agreement.

+

The form is located at: https://www.alcf.anl.gov/files/Acknowledgement_Form.pdf. Please print, sign, scan and email it to accounts@alcf.anl.gov.

+

Managing Project Team Membership

+

As a PI, you can add members to your project. You can assign proxies who are project members authorized to add or renew project members on your behalf.

+

A project PI or proxy has the authority to:

+
    +
  • Approve and renew accounts
  • +
  • Add and delete users to/from the project
  • +
  • Approve Foreign Assignment/Visit Request form renewals for project members who are foreign nationals
  • +
+

During your project setup, the ALCF Support Team will request the following information to establish your project members:

+
    +
  • The names, email addresses, and/or ALCF usernames (if already existing) of up to two proxies and all project members.
  • +
+

About Project and UNIX Group Membership

+

All project members have the ability to run jobs against your allocation. There is no limit to the number of project members you may authorize. +Project members are automatically added to the project UNIX group giving them the ability to write to the project directory and to access project data. When a project member is added or removed from a project, this automatically be reflected in the project UNIX group membership.

+

Adding Project Members

+

The PI or a proxy must approve each team member to access ALCF resources and run jobs on their project. PI/proxies can respond to emails from ALCF for account access approval with a "yes" or "no".

+

PI/proxies with active ALCF accounts can also approve new account requests, project membership requests, account reactivation requests, add existing active ALCF users to the project by logging into the ALCF Account and Project Management application.

+

Note: If PI/proxies need to request an ALCF account, see the section below for instructions on "how to apply" for an account.

+

Accounts and Access for your Project Members

+

All project members will need an ALCF user account to access project data and to run jobs on ALCF systems.

+

Members that do not have an ALCF account should request one at: https://accounts.alcf.anl.gov/accountRequest. When prompted for project name, they should select your project short name.

+

If your project members have ALCF accounts that are no longer active, please ask them to submit a reactivation request here: https://accounts.alcf.anl.gov/accountReactivate. When prompted for project name, they should select your project short name.

+

If you project members have active ALCF accounts but have not been added to your project, they should submit a request to join your project by going to this page: https://accounts.alcf.anl.gov/#!/joinProject.

+

Moving Your Data

+

We encourage you to use Globus to move your project data to your ALCF project directory before your allocation begins. For details, see Using Globus on Theta.

+

Project Status Reports for INCITE, ALCC, and ADSP

+

Note: PIs that are awarded a Director's Discretionary will not receive weekly status project reports.

+

Shortly after your allocation begins, we will begin sending you a weekly project status report via support@alcf.anl.gov to keep you informed or your award progress.

+

Look for an email from us with the subject line: ALCF [ALLOCATION PROGRAM] Project Status Report for [PROJECT SHORT NAME]

+

Reporting Requirements for INCITE, ALCC, and ADSP

+

Note: PIs that are awarded a Director's Discretionary allocations are not required to submit project reports.

+

If you receieved INCITE, ALCC, or ADSP allocation award, quarterly reporting is required to keep DOE informed of progress related to your allocation.

+

The ALCF will send you a report template at the end of each quarter. Please complete the report promptly and submit it via email to support@alcf.anl.gov. For more information see the Quarterly Report webpage.

+

Policies

+

Pullback Policy

+

Please be aware that we will periodically monitor, and could potentially adjust, your project allocation if a large portion of it goes unused. You may view: Pullback Policy

+

Allocation Overburn Policy

+

Please see this page for overburn/overuse eligibility for INCITE projects that have exhausted their allocation in the first 11 months of its allocation year: Allocation Overburn

+

Acknowledgment In Publications

+

Please follow the guidelines provided on the ALCF Acknowledgement Policy page to properly acknowledge the use of ALCF resources in all of your publications, both online and print.

+

Facility Policies

+

Facility policies have been established to provide consistent and reliable services. Please read about our [ALCF Facility Policies] (../policies/facility-policies.md).

+

Useful Allocation and Quota Commands

+

We have an allocation management tool called sbank, and below are a few helpful sbank commands.

+
    +
  • myprojectquotas: log into Theta and type this command to view the project directory quotas for all your projects
  • +
  • myquota: log into Theta and type this command to view your home directory quota
  • +
+

You can use the following command to check your project balance on Theta: +- sbank-list-allocations -p -r

+

For more command examples and details, see sbank.

+

How Can We Help?

+

We can also help resolve any issues or needs that may be delaying the start of your scientific campaign. +- Are you in need of high-throughput software? +- Are you having difficulty compiling your application? +- Does your code have limited restart capabilities?

+

If your project allocation usage is being held back for reasons due to one of our systems, please contact us for assistance by emailing support@alcf.anl.gov.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/account-project-management/project-management/team-management/index.html b/account-project-management/project-management/team-management/index.html new file mode 100644 index 0000000000..1e6af92112 --- /dev/null +++ b/account-project-management/project-management/team-management/index.html @@ -0,0 +1,6806 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Managing Your Team Members - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+ +
+ + + +
+
+ + + + + + + +

Managing Your Team Members

+

New project members will need a user account to access project data and to run jobs on ALCF systems.

+

Please instruct any members who do not have an ALCF account to request one as soon as possible by visiting: https://accounts.alcf.anl.gov/#!/accountRequest. When prompted for project name, they should select the "short name" for your project.

+

The PI or Proxy must approve each member of the team to gain access and to run project jobs on the ALCF's resources. If you have an active ALCF account, you can manage your project team by logging into the ALCF account and project management website and navigating to https://accounts.alcf.anl.gov/#!/manageProjects

+

Accessing your project(s)

+
    +
  1. Log in at https://accounts.alcf.anl.gov/#!/manageProjects using your credentials: ALCF username and Physical/Mobile token passcode for a password.
  2. +
  3. Click on Project Management, located in the right sidebar.
  4. +
  5. You will see a list of projects of which you are the Primary Investigator (PI).
  6. +
  7. Click on the desired project to view information and management options for the selected project.
  8. +
+

Modifying project information

+

Some project information cannot be modified, but as the PI, you can modify the following: project title, institutions, and associated funding.

+

Your project can be associated with multiple institutions, but you must specify a primary institution.

+

Managing project members with an Existing ALCF Account

+
    +
  1. You can manage the membership for your project by clicking on the desired project from the Project Management screen.
  2. +
  3. Add and/or remove proxies and team members by clicking on the red "Remove" button to the right of each member or clicking on "Add new user."
  4. +
  5. You can view account information for each user as it relates to the project:
  6. +
  7. Account Status
  8. +
  9. Project Role
  10. +
  11. Proxy Permissions
  12. +
  13. +

    Membership Status

    +
  14. +
  15. +

    Proxies are individuals authorized to add or renew user accounts for the project PI. You have the ability to upgrade a user from a member to a Proxy, by clicking on the "Proxy" radio button that corresponds with the desired member.

    +
  16. +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/cerebras/customizing-environment/index.html b/ai-testbed/cerebras/customizing-environment/index.html new file mode 100644 index 0000000000..2a82d7d255 --- /dev/null +++ b/ai-testbed/cerebras/customizing-environment/index.html @@ -0,0 +1,6816 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Customizing Environments - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Customizing Environments

+

Using virtual Python environments

+

To make a PyTorch virtual environment for Cerebras

+
#Make your home directory navigable
+chmod a+xr ~/
+mkdir ~/R_2.0.3
+chmod a+x ~/R_2.0.3/
+cd ~/R_2.0.3
+# Note: "deactivate" does not actually work in scripts.
+deactivate
+rm -r venv_cerebras_pt
+/software/cerebras/python3.8/bin/python3.8 -m venv venv_cerebras_pt
+source venv_cerebras_pt/bin/activate
+pip install --upgrade pip
+pip install cerebras_pytorch==2.0.2
+
+ + +

Activation and deactivation

+

To activate a virtual environments

+
source ~/R_2.0.3/venv_cerebras_pt/bin/activate
+
+

To deactivate a virtual environment,

+
deactivate
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/cerebras/example-programs/index.html b/ai-testbed/cerebras/example-programs/index.html new file mode 100644 index 0000000000..1008f6de3b --- /dev/null +++ b/ai-testbed/cerebras/example-programs/index.html @@ -0,0 +1,6901 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Example Programs - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Example Programs

+

Use a local copy of the model zoo

+

Make a working directory and a local copy of the Cerebras modelzoo and anl_shared repository, if not previously done, as follows.

+
mkdir ~/R_2.0.3
+cd ~/R_2.0.3
+git clone https://github.com/Cerebras/modelzoo.git
+cd modelzoo
+git tag
+git checkout Release_2.0.3
+
+ + +

UNet

+

An implementation of this: U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et. al 2015
+To run Unet with the Severstal: Steel Defect Detection kaggle dataset, using a pre-downloaded copy of the dataset:
+First, source a Cerebras PyTorch virtual environment and make sure that requirements are installed.

+
source ~/R_2.0.3/venv_cerebras_pt/bin/activate
+pip install -r ~/R_2.0.3/modelzoo/requirements.txt
+
+

Then

+
cd ~/R_2.0.3/modelzoo/modelzoo/vision/pytorch/unet
+cp /software/cerebras/dataset/severstal-steel-defect-detection/params_severstal_binary_rawds.yaml configs/params_severstal_binary_rawds.yaml
+export MODEL_DIR=model_dir_unet
+if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
+python run.py CSX --job_labels name=unet_pt --params configs/params_severstal_binary_rawds.yaml --model_dir $MODEL_DIR --mode train --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.0.3/modelzoo/ --compile_dir $(whoami) |& tee mytest.log 
+
+ + +

BERT - PyTorch

+

The modelzoo/modelzoo/transformers/pytorch/bert directory is a PyTorch implementation of BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
+This BERT-large msl128 example uses a single sample dataset for both training and evaluation. See the README.md in the source directory for details on how to build a dataset from text input. +First, source a Cerebras PyTorch virtual environment and make sure that the requirements are installed:

+ +
source ~/R_2.0.3/venv_cerebras_pt/bin/activate
+pip install -r ~/R_2.0.3/modelzoo/requirements.txt
+
+

Then

+
cd ~/R_2.0.3/modelzoo/modelzoo/transformers/pytorch/bert
+cp /software/cerebras/dataset/bert_large/bert_large_MSL128_sampleds.yaml configs/bert_large_MSL128_sampleds.yaml
+export MODEL_DIR=model_dir_bert_large_pytorch
+if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
+python run.py CSX --job_labels name=bert_pt --params configs/bert_large_MSL128_sampleds.yaml --num_workers_per_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software/ --python_paths /home/$(whoami)/R_2.0.3/modelzoo/ --compile_dir $(whoami) |& tee mytest.log
+
+

The last parts of the output should resemble the following, with messages about cuda that should be ignored and are not shown.

+
2023-11-29 20:07:49,284 INFO:   Beginning appliance run
+2023-11-29 20:08:14,365 INFO:   | Train Device=CSX, Step=100, Loss=9.50000, Rate=4088.28 samples/sec, GlobalRate=4088.26 samples/sec
+2023-11-29 20:08:39,820 INFO:   | Train Device=CSX, Step=200, Loss=8.37500, Rate=4048.91 samples/sec, GlobalRate=4055.21 samples/sec
+2023-11-29 20:09:05,356 INFO:   | Train Device=CSX, Step=300, Loss=7.96875, Rate=4025.61 samples/sec, GlobalRate=4040.05 samples/sec
+2023-11-29 20:09:30,626 INFO:   | Train Device=CSX, Step=400, Loss=7.56250, Rate=4041.61 samples/sec, GlobalRate=4043.10 samples/sec
+2023-11-29 20:09:56,022 INFO:   | Train Device=CSX, Step=500, Loss=7.50000, Rate=4035.92 samples/sec, GlobalRate=4040.90 samples/sec
+2023-11-29 20:10:21,410 INFO:   | Train Device=CSX, Step=600, Loss=7.37500, Rate=4034.41 samples/sec, GlobalRate=4039.65 samples/sec
+2023-11-29 20:10:46,690 INFO:   | Train Device=CSX, Step=700, Loss=7.37500, Rate=4044.10 samples/sec, GlobalRate=4041.20 samples/sec
+2023-11-29 20:11:12,004 INFO:   | Train Device=CSX, Step=800, Loss=7.25000, Rate=4044.75 samples/sec, GlobalRate=4041.70 samples/sec
+2023-11-29 20:11:37,196 INFO:   | Train Device=CSX, Step=900, Loss=7.21875, Rate=4056.77 samples/sec, GlobalRate=4044.25 samples/sec
+2023-11-29 20:12:02,285 INFO:   | Train Device=CSX, Step=1000, Loss=7.12500, Rate=4071.60 samples/sec, GlobalRate=4047.95 samples/sec
+2023-11-29 20:12:02,286 INFO:   Saving checkpoint at step 1000
+2023-11-29 20:12:37,079 INFO:   Saved checkpoint model_dir_bert_large_pytorch/checkpoint_1000.mdl
+2023-11-29 20:13:25,683 INFO:   Heartbeat thread stopped for wsjob-gfi2baioyfduozkmgsc6a7.
+2023-11-29 20:13:25,691 INFO:   Training completed successfully!
+2023-11-29 20:13:25,691 INFO:   Processed 1024000 sample(s) in 336.373620536 seconds.
+
+

GPT-J PyTorch

+

GPT-J [github] is an auto-regressive language model created by EleutherAI. +This PyTorch GPT-J 6B parameter pretraining sample uses 2 CS2s.

+

First, source a Cerebras PyTorch virtual environment and make sure that the requirements are installed:

+
source ~/R_2.0.3/venv_cerebras_pt/bin/activate
+pip install -r ~/R_2.0.3/modelzoo/requirements.txt
+
+

Then

+
cd ~/R_2.0.3/modelzoo/modelzoo/transformers/pytorch/gptj
+cp /software/cerebras/dataset/gptj/params_gptj_6B_sampleds.yaml configs/params_gptj_6B_sampleds.yaml
+export MODEL_DIR=model_dir_gptj
+if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
+python run.py CSX --job_labels name=gptj_pt --params configs/params_gptj_6B_sampleds.yaml --num_csx=2 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.0.3/modelzoo/ --compile_dir $(whoami) |& tee mytest.log
+
+

The last parts of the output should resemble the following:

+
2023-11-29 20:59:19,223 INFO:   Beginning appliance run
+2023-11-29 21:03:53,875 INFO:   | Train Device=CSX, Step=100, Loss=8.43750, Rate=43.70 samples/sec, GlobalRate=43.70 samples/sec
+2023-11-29 21:08:28,779 INFO:   | Train Device=CSX, Step=200, Loss=8.12500, Rate=43.67 samples/sec, GlobalRate=43.67 samples/sec
+2023-11-29 21:08:28,781 INFO:   Saving checkpoint at step 200
+2023-11-29 21:13:56,695 INFO:   Saved checkpoint model_dir_gptj/checkpoint_200.mdl
+2023-11-29 21:14:30,135 INFO:   Heartbeat thread stopped for wsjob-kd4olqkhu6ya8qqzt88utd.
+2023-11-29 21:14:30,142 INFO:   Training completed successfully!
+2023-11-29 21:14:30,142 INFO:   Processed 24000 sample(s) in 910.883781998 seconds.
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/cerebras/files/Cerebras-connectivity-diagram.png b/ai-testbed/cerebras/files/Cerebras-connectivity-diagram.png new file mode 100644 index 0000000000..11b605d9b1 Binary files /dev/null and b/ai-testbed/cerebras/files/Cerebras-connectivity-diagram.png differ diff --git a/ai-testbed/cerebras/files/Cerebras_Wafer-Scale_Cluster_login_diagram.png b/ai-testbed/cerebras/files/Cerebras_Wafer-Scale_Cluster_login_diagram.png new file mode 100644 index 0000000000..ea158d24a7 Binary files /dev/null and b/ai-testbed/cerebras/files/Cerebras_Wafer-Scale_Cluster_login_diagram.png differ diff --git a/ai-testbed/cerebras/files/Trust_ctl.png b/ai-testbed/cerebras/files/Trust_ctl.png new file mode 100644 index 0000000000..dd551d3c08 Binary files /dev/null and b/ai-testbed/cerebras/files/Trust_ctl.png differ diff --git a/ai-testbed/cerebras/files/compile-vs-run.png b/ai-testbed/cerebras/files/compile-vs-run.png new file mode 100644 index 0000000000..539702596f Binary files /dev/null and b/ai-testbed/cerebras/files/compile-vs-run.png differ diff --git a/ai-testbed/cerebras/files/cs-getting-started.png b/ai-testbed/cerebras/files/cs-getting-started.png new file mode 100644 index 0000000000..02c7603e7b Binary files /dev/null and b/ai-testbed/cerebras/files/cs-getting-started.png differ diff --git a/ai-testbed/cerebras/files/grafana_ctl.png b/ai-testbed/cerebras/files/grafana_ctl.png new file mode 100644 index 0000000000..bc045a8bd9 Binary files /dev/null and b/ai-testbed/cerebras/files/grafana_ctl.png differ diff --git a/ai-testbed/cerebras/getting-started/index.html b/ai-testbed/cerebras/getting-started/index.html new file mode 100644 index 0000000000..dd12e19179 --- /dev/null +++ b/ai-testbed/cerebras/getting-started/index.html @@ -0,0 +1,6765 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Getting Started - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + + +

Getting Started

+

Connection to a CS-2 node

+ + +

Cerebras Wafer-Scale Cluster connection diagram +Connection to one of the CS-2 cluster login nodes requires an MFA passcode for authentication - either an 8-digit passcode generated by an app on your mobile device (e.g. MobilePASS+) or a CRYPTOCard-generated passcode prefixed by a 4-digit pin. This is the same passcode used to authenticate into other ALCF systems, such as Theta and Cooley.
+In the examples below, replace ALCFUserID with your ALCF user id.
+To connect to a CS-2 login:

+
    +
  1. ssh to a desired login node: +
    ssh ALCFUserID@cer-login-01.ai.alcf.anl.gov
    +
    + or +
    ssh ALCFUserID@cer-login-02.ai.alcf.anl.gov
    +
    + or +
    ssh ALCFUserID@cer-login-03.ai.alcf.anl.gov
    +
  2. +
  3. Alternatively, ssh randomly to one of the above three login nodes: +
    ssh ALCFUserID@cerebras.ai.alcf.anl.gov
    +
  4. +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/cerebras/job-queuing-and-submission/index.html b/ai-testbed/cerebras/job-queuing-and-submission/index.html new file mode 100644 index 0000000000..8db2dfb90e --- /dev/null +++ b/ai-testbed/cerebras/job-queuing-and-submission/index.html @@ -0,0 +1,6739 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Job Queuing and Submission - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Job Queuing and Submission

+

The CS-2 cluster has its own Kubernetes-based system for job submission and queuing.

+

Jobs are started automatically through the Python framework in modelzoo.common.pytorch.run_utils +Continuous job status for a job is output to stdout/stderr; redirect the output, or consider using a persistent session started with screen, or tmux, or both.

+

Jobs that have not yet completed can be listed as shown. Note: this command can take over a minute to complete.

+

(venv_cerebras_pt) $ csctl get jobs
+NAME                          AGE  DURATION  PHASE    SYSTEMS     USER     LABELS        DASHBOARD
+wsjob-thjj8zticwsylhppkbmjqe  13s  1s        RUNNING  cer-cs2-01  username name=unet_pt  https://grafana.cerebras1.lab.alcf.anl.gov/d/WebHNShVz/wsjob-dashboard?orgId=1&var-wsjob=wsjob-thjj8zticwsylhppkbmjqe&from=1691705374000&to=now
+(venv_cerebras_pt) $
+
+To view the grafana databoard for a job, follow the instructions at Grafana WsJob Dashboard for Cerebras jobs

+

Jobs can be canceled as shown:

+
(venv_cerebras_pt) $ csctl cancel job wsjob-eyjapwgnycahq9tus4w7id
+Job canceled successfully
+(venv_cerebras_pt) $
+
+

Jobs can be labeled in the command line that launches them, if they are written with Cerebras's Python framework for running appliance jobs, by adding a command line option of this form: +

 --job_labels labelname=labelvalue
+

+

Jobs can also be labeled after they have been started as shown: +

(venv_cerebras_pt) $ csctl label job wsjob-ez6dyfronnsg2rz7f7fqw4 testlabel=test
+job/wsjob-ez6dyfronnsg2rz7f7fqw4 was patched
+(venv_cerebras_pt) $
+

+

Jobs with a particular label/label value can be listed as shown: +

(venv_cerebras_pt) $ csctl get jobs | grep "testlabel=test"
+wsjob-ez6dyfronnsg2rz7f7fqw4  19m SUCCEEDED  cer-cs2-02 username testlabel=test,user=username
+(venv_cerebras_pt) $
+

+

See csctl -h for more options.
+Add -h to a command for help for that command, e.g. csctl get -h or csctl cancel -h.

+
$ csctl -h
+Cerebras cluster command line tool.
+
+Usage:
+  csctl [command]
+
+Available Commands:
+  cancel             Cancel job
+  clear-worker-cache Clear the worker cache
+  config             View csctl config files
+  get                Get resources
+  label              Label resources
+  log-export         Gather and download logs.
+  types              Display resource types
+
+Flags:
+  -d, --debug int          higher debug values will display more fields in output objects
+  -h, --help               help for csctl
+      --namespace string   configure csctl to talk to different user namespaces
+  -v, --version            version for csctl
+
+Use "csctl [command] --help" for more information about a command.
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/cerebras/miscellaneous/index.html b/ai-testbed/cerebras/miscellaneous/index.html new file mode 100644 index 0000000000..7245ebedb7 --- /dev/null +++ b/ai-testbed/cerebras/miscellaneous/index.html @@ -0,0 +1,6858 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Miscellaneous - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Miscellaneous

+

Porting applications to the CS-2

+

Cerebras documentation for porting code to run on a Cerebras CS-2 system:
+Ways to port your model

+

Grafana WsJob Dashboard for Cerebras jobs

+

A Grafana dashboard provides support for visualizing, querying, and exploring the CS2 system’s metrics and enables to access system logs and traces. +See the Cerebras documentation for the Job Information Dashboard

+

Here is a summary (tested to work on Ubuntu and MacOS)

+

On your work machine with a web browser, e.g. your laptop,
+edit /etc/hosts, using your editor of choice +

sudo nano /etc/hosts
+
+Add this line +
127.0.0.1   grafana.cerebras1.lab.alcf.anl.gov
+
+Save, and exit the editor

+

Download the Grafana certificate present on the Cerebras node at /opt/cerebras/certs/grafana_tls.crt to your local machine. To add this certificate to your browser keychain,

+
    +
  1. On chrome, go to Settings->Privacy and security->Security->Manage device certificates
  2. +
  3. Select System under "System Keychains" on the left hand side of your screen. Also select the "Certificate" tab.
  4. +
  5. Drag and drop the downloaded certificate. Once it is added, it is visible as "lab.alcf.anl.gov" + Cerebras Wafer-Scale Cluster connection diagram
  6. +
  7. Select the certificate, and ensure that the "Trust" section is set to "Always Trust" + Cerebras Wafer-Scale Cluster connection diagram
  8. +
+

On your work machine with a web browser, e.g. your laptop,
+tunnel the grafana https port on the cerebras grafana host through to localhost +

ssh -L 8443:grafana.cerebras1.lab.alcf.anl.gov:443 arnoldw@cer-login-03.ai.alcf.anl.gov
+

+

Point a browser at grafana. (Tested with Firefox and Chrome/Brave)
+Open browser to a job grafana url shown in csctl get jobs, adding :8443 to hostname, e.g.
+

https://grafana.cerebras1.lab.alcf.anl.gov:8443/d/WebHNShVz/wsjob-dashboard?orgId=1&var-wsjob=wsjob-49b7uuojdelvtrcxu3cwbw&from=1684859330000&to=noww
+

+

Login to the dashboard with user admin, and password prom-operator

+ + + + + + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/cerebras/running-a-model-or-program/index.html b/ai-testbed/cerebras/running-a-model-or-program/index.html new file mode 100644 index 0000000000..710f81f816 --- /dev/null +++ b/ai-testbed/cerebras/running-a-model-or-program/index.html @@ -0,0 +1,6981 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Running a Model/Program - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Running a Model/Program

+

Getting Started

+

Job submission and queuing

+

Cerebras jobs are initiated and tracked automatically within the Python framework in modelzoo.common.pytorch.run_utils. This framework interacts with the Cerebras cluster management node.

+

Login nodes

+

Jobs are launched from login nodes. +If you expect a loss of an internet connection for any reason, for long-running jobs we suggest logging into a specific login node and using either screen or tmux to create persistent command line sessions. For details use:2

+
man screen
+# or
+man tmux
+
+

Running jobs on the wafer

+

Follow these instructions to compile and train the fc_mnist PyTorch sample. This models is a couple of fully connected layers plus dropout and RELU.

+

Cerebras virtual environments

+

First, make a virtual environment for Cerebras for PyTorch. +See Customizing Environments for the procedures for making PyTorch virtual environments for Cerebras. +If an environment is made in ~/R_2.0.3/, it they would be activated as follows: +

source ~/R_2.0.3/venv_cerebras_pt/bin/activate
+

+

Clone the Cerebras modelzoo

+
mkdir ~/R_2.0.3
+cd ~/R_2.0.3
+git clone https://github.com/Cerebras/modelzoo.git
+cd modelzoo
+git tag
+git checkout Release_2.0.3
+
+

Running a Pytorch sample

+

Activate your PyTorch virtual environment, install modelzoo requirements, and change to the working directory

+
source ~/R_2.0.3/venv_cerebras_pt/bin/activate
+pip install -r ~/R_2.0.3/modelzoo/requirements.txt
+cd ~/R_2.0.3/modelzoo/modelzoo/fc_mnist/pytorch
+
+

Next, edit configs/params.yaml, making the following changes:

+
 train_input:
+-    data_dir: "./mnist"
++    data_dir: "/software/cerebras/dataset/fc_mnist/data/mnist/train"
+
+

and

+
 eval_input:
+-    data_dir: "./mnist"
++    data_dir: "/software/cerebras/dataset/fc_mnist/data/mnist/train"
+
+

If you want to have the sample download the dataset, you will need to specify absolute paths for the "data_dir"s.

+

Running a sample PyTorch training job

+

To run the sample:

+
export MODEL_DIR=model_dir
+# deletion of the model_dir is only needed if sample has been previously run
+if [ -d "$MODEL_DIR" ]; then rm -Rf $MODEL_DIR; fi
+python run.py CSX --job_labels name=pt_smoketest --params configs/params.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.0.3/modelzoo --compile_dir /$(whoami) |& tee mytest.log
+
+

A successful fc_mnist PyTorch training run should finish with output resembling the following:

+
2023-11-29 18:13:13,048 INFO:   | Train Device=CSX, Step=1950, Loss=2.28834, Rate=397.31 samples/sec, GlobalRate=433.98 samples/sec
+2023-11-29 18:13:13,555 INFO:   | Train Device=CSX, Step=2000, Loss=2.34778, Rate=395.69 samples/sec, GlobalRate=431.83 samples/sec
+2023-11-29 18:13:13,555 INFO:   Saving checkpoint at step 2000
+2023-11-29 18:13:17,242 INFO:   Saved checkpoint model_dir/checkpoint_2000.mdl
+2023-11-29 18:13:55,517 INFO:   Heartbeat thread stopped for wsjob-fpwqt7maq8a5mxvblwwzbu.
+2023-11-29 18:13:55,523 INFO:   Training completed successfully!
+2023-11-29 18:13:55,523 INFO:   Processed 4000 sample(s) in 51.230697212 seconds.
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/cerebras/system-overview/index.html b/ai-testbed/cerebras/system-overview/index.html new file mode 100644 index 0000000000..171e39f531 --- /dev/null +++ b/ai-testbed/cerebras/system-overview/index.html @@ -0,0 +1,6709 @@ + + + + + + + + + + + + + + + + + + + + + + + + + System Overview - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

System Overview

+

The Cerebras CS-2 is a wafer-scale deep learning accelerator comprising 850,000 processing cores, each providing 48KB of dedicated SRAM memory for an on-chip total of 40GB and interconnected to optimize bandwidth and latency. Its software platform integrates the popular machine learning framework PyTorch.

+

The ALCF CS-2 systems are configured as a Cerebras Wafer-Scale Cluster, designed to support large-scale models (up to and well beyond 1 billion parameters) and large-scale inputs. The cluster contains two CS-2 systems and can distribute jobs across one or both CS-2 systems in a data-parallel framework. The supporting CPU cluster consists of MemoryX, SwarmX, management, and input worker nodes. The Cerebras Wafer-Scale cluster is run as an appliance: a user submits a job to the appliance, and the appliance manages preprocessing and streaming of the data, IO, and device orchestration within the appliance. It provides programming via PyTorch, with data-parallel distribution when using more than one CS-2. This installation supports both Pipelined execution for models up to 1 billion parameters and Weight Streaming execution for models up to and above 1 billion parameters.

+ + + + +

The public Cerebras documentation is available here.

+

A typical Cerebras Wafer-Scale Cluster is shown in the figure.
+Users connect (ssh) to one of the three login nodes. Either ssh to cerebras.ai.alcf.anl.gov, which randomly resolves to one of cer-login-0[1-3].ai.alcf.anl.gov, or ssh to a specific node, cer-login-01.ai.alcf.anl.gov, cer-login-02.ai.alcf.anl.gov, cer-login-03.ai.alcf.anl.gov. +The rest of the nodes in the cluster infrastructure are not directly accessible, except by admins. +The trees /home, /projects, and /software are shared across all three login nodes, the relevant cluster infrastructure nodes, and all ALCF AI testbed platforms.

+
+ +

CS-2 cluster figure +

+
CS-2 cluster figure
+
+

(Figure from +https://docs.cerebras.net/en/latest/wsc/cerebras-basics/how-cerebras-works.html)

+

As indicated in the figures, the CS-2 nodes on the right are responsible only for running and accelerating the computations for training and predictions with the model. The other work, including compilation, is performed by input nodes, and by MemoryX nodes, which are used for weight storage and broadcast, and SwarmX nodes, which are used for gradient accumulation. Some model verification work can be done on login nodes.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/cerebras/tunneling-and-forwarding-ports/index.html b/ai-testbed/cerebras/tunneling-and-forwarding-ports/index.html new file mode 100644 index 0000000000..9d68e48ed7 --- /dev/null +++ b/ai-testbed/cerebras/tunneling-and-forwarding-ports/index.html @@ -0,0 +1,6689 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Tunneling and Forwarding Ports - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+ +
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/data-management/data-management-overview/index.html b/ai-testbed/data-management/data-management-overview/index.html new file mode 100644 index 0000000000..f5fb0562e7 --- /dev/null +++ b/ai-testbed/data-management/data-management-overview/index.html @@ -0,0 +1,6828 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Data Management - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Data Management for the AI Testbed

+

Home File System Space

+

Users have a shared home filesystem /home shared across the ALCF AI testbed systems, including the login and compute nodes. Default user quota is 1 TB storage and 1,000,000 files. This space is backed up.

+

Project File System Space

+

The team project/campaign file system /projects is intended to facilitate project collaboration and is accessible to the team members of your project that have an ALCF account. Default group storage quota is 2 TB and 2,000,000 files. Please note that this space isn't backed up. Our policy is that data will be purged from disk 6 months after project completion.

+

Data Transfer

+

Users can transfer data to and from the AI testbed using Globus or tools such as scp or rsync.

+

Using Globus

+

We have a Globus endpoint each to move data to and from the /projects and /home filesystem respectively.

+
    +
  • Use alcf#ai_testbed_projects for the /projects file system
  • +
  • Use alcf#ai_testbed_home for the /home files system
  • +
+

Relevant information regarding using globus can be found here

+

ALCF Storage Policies

+

ALCF data policies is available here

+

Please Note: The basic level of protection provided is UNIX file level permissions; it is the user's responsibility to ensure that file permissions and umasks are set to match their needs.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/files/dictionary.txt b/ai-testbed/files/dictionary.txt new file mode 100644 index 0000000000..265cd1ea35 --- /dev/null +++ b/ai-testbed/files/dictionary.txt @@ -0,0 +1,84 @@ +aitestbed +ALCFUserID +analyser +ANL +arnoldw +AUTOTUNE +Cerebras +conv +cosmictagger +cpus +cuda +cudart +DATADIR +dlerror +dlopen +elif +finetune +flos +gbps +GEMM +Graphcore +graphcore_login +gres +inet +inplace +jsons +kaggle +keras +keygen +keyscan +layernorm +lenet +libcudart +libnvinfer +LOGDIR +logreg +mgmt +mnist +MNIST +modelzoo +nodelist +ntasks +OUTDIR +passcode +petaFLOPS +POPART +POPLIBS +poptorch +popvision +pretrain +pretraining +PYTHONPATH +relu +RELU +resnet +run_unet_256_256_single_4 +SambaFlow +sambanova +sbatch +scancel +Slurm +snconfig +snpath +snthreads +sntilestat +snvenv +softmax +squeue +SRAM +srun +tensorrt +tf2tensorrt +TFLOPs +unet +UNet +unet +unet_compile_run_all +Venkat +venv +venvs +vipu +virtualenv +wilsonb +XRDU diff --git a/ai-testbed/files/example-multi-node-programs.sh b/ai-testbed/files/example-multi-node-programs.sh new file mode 100644 index 0000000000..76da2aa76e --- /dev/null +++ b/ai-testbed/files/example-multi-node-programs.sh @@ -0,0 +1,64 @@ +#! /bin/bash -x +set -e +# +# Usage: ./unet_all.sh 256 256 +# +SECONDS=0 + +# IMage size. +IM=${1} +# Batch Size +BS=${2} +NUM_WORKERS=1 +export OMP_NUM_THREADS=16 + +source /opt/sambaflow/venv/bin/activate +UNET=$(pwd)/unet + +echo "Model: UNET" +echo "Date: " $(date +%m/%d/%y) +echo "Time: " $(date +%H:%M) + +echo "COMPILE" + +# Compile for parallel RDUs +if [ ! -e out/unet_train_${BS}_${IM}_NN/unet_train_${BS}_${IM}_NN.pef ] ; then + python ${UNET}/unet.py compile -b ${BS} --in-channels=3 --in-width=${IM} --in-height=${IM} --enable-conv-tiling --mac-v2 --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --pef-name="unet_train_${BS}_${IM}_NN" --data-parallel -ws 2 > compile_${BS}_${IM}_NN.log 2>&1 +fi + +# Run Multi-Node, Data Parallel +NN=2 +echo "RUN" +echo "NN=${NN}" +sbatch --gres=rdu:1 --tasks-per-node 8 --nodes 2 --nodelist sm-02,sm-01 --cpus-per-task=16 ./unet_batch.sh ${NN} ${NUM_WORKERS} +echo "Duration: " $SECONDS + +#! /bin/bash -x +set -e +# +# Usage: ./unet_batch.sh 2 1 +# +SECONDS=0 + +# Batch Size +BS=256 + +# IMage size +IM=256 +NN=${1} +NUM_WORKERS=${2} +export OMP_NUM_THREADS=16 +DATADIR=/software/sambanova/dataset/kaggle_3m +UNET=$(pwd)/unet +export SAMBA_CCL_USE_PCIE_TRANSPORT=0 + +# TODO: Update this. +source /opt/sambaflow/venv/bin/activate + +echo "Model: UNET_TRAIN" +echo "Date: " $(date +%m/%d/%y) +echo "Time: " $(date +%H:%M) + +srun --mpi=pmi2 python ${UNET}/unet_hook.py run --do-train --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 2 --data-dir ${DATADIR} --log-dir log_dir_unet_${NN}_train_kaggle --pef=$(pwd)/out/unet_train_${BS}_${IM}_NN/unet_train_${BS}_${IM}_NN.pef --data-parallel --reduce-on-rdu --num-workers=${NUM_WORKERS} + +echo "Duration: " $SECONDS diff --git a/ai-testbed/files/home-cerebras-sambanova.png b/ai-testbed/files/home-cerebras-sambanova.png new file mode 100644 index 0000000000..d7a9dffb3f Binary files /dev/null and b/ai-testbed/files/home-cerebras-sambanova.png differ diff --git a/ai-testbed/files/notes/index.html b/ai-testbed/files/notes/index.html new file mode 100644 index 0000000000..55c4adad6f --- /dev/null +++ b/ai-testbed/files/notes/index.html @@ -0,0 +1,6670 @@ + + + + + + + + + + + + + + + + + + + + + Notes - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Notes

+
git submodule init; git submodule update
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/files/todo/index.html b/ai-testbed/files/todo/index.html new file mode 100644 index 0000000000..57d9e3a2e7 --- /dev/null +++ b/ai-testbed/files/todo/index.html @@ -0,0 +1,6815 @@ + + + + + + + + + + + + + + + + + + + + + TODO - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

TODO

+

CosmicTagger v1.x

+
+

Note: Conversion of CT to the various machines is meant to be a tutorial as to how +to convert a model.

+
+

Cerebras CT

+

Cerebras cannot support CT and UNets in general as of 4/25/23.

+

Graphcore CT

+

Alex has been very busy with conferences, etc.

+

He ran CT but, it ran on the CPU. He has stated that it may need to be completely written +using, I can't remember which, Poplar or PopArt. If that is necessary, Venkat should +make the call.

+

Groq CT

+

Habana CT

+

Repo: https://github.com/argonne-lcf/user-guides.git +Branch: feature/Habana002-DNP +File: docs/ai-testbed/habana/CosmicTagger-Conversion.md

+

SambaNova CT

+

SN has a highly-engineered version of CT.

+

They are working to support CT OOB, Out-Of-Box.

+

Cerebras

+

Repo: https://github.com/argonne-lcf/user-guides.git +Branch: Talk to Bill.

+

Graphcore

+

Repo: https://github.com/argonne-lcf/user-guides.git

+

When you change back to 3.2, use virtual-environments.md from the commit a4ce3b5598f4d6feee7ca58accde1a6a0ea84244 "virtual-environments.md with 3.2 edits."

+

Groq

+

Repo: https://github.com/argonne-lcf/user-guides.git +Branch: feature/Groq001-DNP

+

Habana

+

Repo: https://github.com/argonne-lcf/user-guides.git +Branch: feature/Habana002-DNP

+

SambaNova

+

Repo: https://github.com/argonne-lcf/user-guides.git

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/getting-started/index.html b/ai-testbed/getting-started/index.html new file mode 100644 index 0000000000..a401d727ea --- /dev/null +++ b/ai-testbed/getting-started/index.html @@ -0,0 +1,6807 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Getting Started - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

ALCF AI Testbed

+
+

Cerebras and SambaNova detail photos

+
+

The ALCF AI Testbed houses some of the most advanced AI accelerators for scientific research.

+

The goal of the testbed is to enable explorations into next-generation machine learning applications and workloads, enabling the ALCF and its user community to help define the role of AI accelerators in scientific computing and how to best integrate such technologies with supercomputing resources.

+

The AI accelerators complement the ALCF's current and next-generation supercomputers to provide a state-of-the-art computing environment that supports pioneering research at the intersection of AI, big data, and high performance computing (HPC).

+

The platforms are equipped with architectural features that support AI and data-centric workloads, making them well suited for research tasks involving the growing deluge of scientific data produced by powerful tools, such as supercomputers, light sources, telescopes, particle accelerators, and sensors. In addition, the testbed will allow researchers to explore novel workflows that combine AI methods with simulation and experimental science to accelerate the pace of discovery.

+

How to Get Access

+

Researchers interested in using the AI Testbed’s Cerebras CS-2, SambaNova DataScale SN30, Graphcore Bow Pod64 and GroqRack platforms can now submit project proposals via the ALCF’s Director’s Discretionary program. Access to additional testbed resources, including Habana accelerators, will be announced at a later date.

+

Submit your proposal requests at: Allocation Request Page

+

Getting Started

+
    +
  1. +

    Request a Director's Discretionary project on SambaNova/Cerebras/Graphcore/Groq.

    +
  2. +
  3. +

    Apply for an ALCF account after the project request is approved. Choose the SambaNova/Cerebras/Graphcore/Groq project that your PI has created at ALCF. If you have an active ALCF account, request to join the project after your project is approved.

    +
  4. +
  5. +

    Transfer data to ALCF using Globus after your account has been created.

    +

    a. The endpoint for your data in ALCF is alcf#ai_testbed_projects with the path to your project being /<project name>.

    +

    b. The endpoint for your home directory on the AI Testbeds in ALCF is alcf#ai_testbed_home.

    +
  6. +
  7. +

    Add/invite team members to your ALCF project on SambaNova/Cerebras/Graphcore/Groq.

    +
  8. +
+

How to Contribute to Documentation

+

The documentation is based on MkDocs and source files are +on GitHub. You can contribute to the documentation by creating a pull request.

+

Learn more on how to contribute to documentation.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/documentation/index.html b/ai-testbed/graphcore/documentation/index.html new file mode 100644 index 0000000000..0ca4ff4ab7 --- /dev/null +++ b/ai-testbed/graphcore/documentation/index.html @@ -0,0 +1,6694 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Documentation - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + + + + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/example-programs/index.html b/ai-testbed/graphcore/example-programs/index.html new file mode 100644 index 0000000000..132ee1b76b --- /dev/null +++ b/ai-testbed/graphcore/example-programs/index.html @@ -0,0 +1,7475 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Example Programs - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Example Programs

+

Graphcore provides examples of some well-known AI applications in their repository at https://github.com/graphcore/examples.git. +Clone the examples repository to your personal directory structure: +

mkdir ~/graphcore
+cd ~/graphcore
+git clone https://github.com/graphcore/examples.git
+

+

MNIST - PopTorch

+

Activate PopTorch Environment

+
source ~/venvs/graphcore/poptorch33_env/bin/activate
+
+

Install Requirements

+

Change directory: +

cd ~/graphcore/examples/tutorials/simple_applications/pytorch/mnist
+

+

Run MNIST

+

Execute the command: +

/opt/slurm/bin/srun --ipus=1 python mnist_poptorch.py
+

+

Output

+

The expected output will resemble the following:

+
srun: job 10671 queued and waiting for resources
+srun: job 10671 has been allocated resources
+TrainingModelWithLoss(
+  (model): Network(
+    (layer1): Block(
+      (conv): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
+      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+      (relu): ReLU()
+    )
+    (layer2): Block(
+      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
+      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+      (relu): ReLU()
+    )
+    (layer3): Linear(in_features=1600, out_features=128, bias=True)
+    (layer3_act): ReLU()
+    (layer3_dropout): Dropout(p=0.5, inplace=False)
+    (layer4): Linear(in_features=128, out_features=10, bias=True)
+    (softmax): Softmax(dim=1)
+  )
+  (loss): CrossEntropyLoss()
+)
+Epochs:   0%|          | 0/10 [00:00<?,[23:27:06.753] [poptorch:cpp] [warning] [DISPATCHER] Type coerced from Long to Int for tensor id 10
+Graph compilation: 100%|██████████| 100/100 [00:00<00:00]
+Epochs: 100%|██████████| 10/10 [01:17<00:00,  7.71s/it]
+Graph compilation: 100%|██████████| 100/100 [00:00<00:00]                          
+Accuracy on test set: 96.85%██████| 100/100 [00:00<00:00]
+
+

MNIST - Tensorflow2

+

Activate Tensorflow2 Environment

+

Create a TensorFlow2 environment as explained in the tensorflow-2-environment-setup and activate the same. +

source ~/venvs/graphcore/tensorflow2_33_env/bin/activate
+

+

Install Requirements

+

Change directory: +

cd ~/graphcore/examples/tutorials/simple_applications/tensorflow2/mnist/
+

+

Run MNIST - TensorFlow

+

Execute the command:

+
/opt/slurm/bin/srun --ipus=1 python mnist.py
+
+

Output

+

The expected output will resemble the following:

+
srun: job 10672 queued and waiting for resources
+srun: job 10672 has been allocated resources
+2023-08-22 23:35:02.925033: I tensorflow/compiler/plugin/poplar/driver/poplar_platform.cc:43] Poplar version: 3.3.0 (de1f8de2a7) Poplar package: b67b751185
+2023-08-22 23:35:06.119772: I tensorflow/compiler/plugin/poplar/driver/poplar_executor.cc:1619] TensorFlow device /device:IPU:0 attached to 1 IPU with Poplar device ID: 0
+2023-08-22 23:35:07.087287: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
+2023-08-22 23:35:07.351132: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:210] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
+2023-08-22T23:35:09.469066Z PL:POPOPS    3545299.3545299 W: createOutputForElementWiseOp 'while/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits/fusion.3/Op/Equal/Out' ({32,10}): No suitable input found, creating new variable with linear tile mapping
+2023-08-22 23:35:18.532415: I tensorflow/compiler/jit/xla_compilation_cache.cc:376] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
+Epoch 1/4
+2000/2000 [==============================] - 13s 6ms/step - loss: 0.6220
+Epoch 2/4
+2000/2000 [==============================] - 1s 262us/step - loss: 0.3265
+Epoch 3/4
+2000/2000 [==============================] - 1s 273us/step - loss: 0.2781
+Epoch 4/4
+2000/2000 [==============================] - 1s 289us/step - loss: 0.2482
+
+ + +

ResNet50

+

Activate PopTorch Environment

+

Create and activate a fresh PopTorch environment poptorch33_resnet50_env as outlined in the virtual environment section, then activate it. +

source ~/venvs/graphcore/poptorch33_resnet50_env/bin/activate
+

+

Install Requirements

+

Change directory +

cd ~/graphcore/examples/vision/cnns/pytorch
+make install 
+make install-turbojpeg
+

+

Update configs.yml

+

Change directory: +

cd ~/graphcore/examples/vision/cnns/pytorch/train
+
+Open configs.yml with your favorite editor. +Find in the resnet50 section +
use_bbox_info: true
+
+and change it to: +
use_bbox_info: false
+

+ + +

Run ResNet50

+

The scripts to train a ResNet50 PyTorch model on Pod4 is located at https://github.com/graphcore/examples/tree/master/vision/cnns/pytorch/train

+

Set the following environmental variables. +

mkdir -p ~/graphcore/tmp/pt_cache/
+export PYTORCH_CACHE_DIR=~/graphcore/tmp/pt_cache/
+
+To run 4 replicas (a total for 4 IPUs) of the ResNet50 model: +Make a script with the following contents, called poprun_unet.sh
+This script tells poprun to use the partition id of the partition created for the slurm job used to run the script. +
#!/bin/bash
+poprun -vv --vipu-partition=slurm_${SLURM_JOBID} --num-instances=1 --num-replicas=4 --executable-cache-path=$PYTORCH_CACHE_DIR python3 /home/$USER/graphcore/examples/vision/cnns/pytorch/train/train.py --config resnet50-pod4 --imagenet-data-path /mnt/localdata/datasets/imagenet-raw-dataset --epoch 2 --validation-mode none --dataloader-worker 14 --dataloader-rebatch-size 256
+
+Then +
chmod +x poprun_unet.sh
+/opt/slurm/bin/srun --ipus=4 poprun_unet.sh
+

+

This model is run with the imagenet dataset.

+

Output

+

The expected output starts with this: +

srun: job 10675 queued and waiting for resources
+srun: job 10675 has been allocated resources
+23:48:29.160 3555537 POPRUN [I] V-IPU server address picked up from 'vipu': 10.1.3.101:8090
+23:48:29.160 3555537 POPRUN [D] Connecting to 10.1.3.101:8090
+23:48:29.162 3555537 POPRUN [D] Status for partition slurm_10673: OK (error 0)
+23:48:29.162 3555537 POPRUN [I] Partition slurm_10673 already exists and is in state: PS_ACTIVE
+23:48:29.163 3555537 POPRUN [D] The reconfigurable partition slurm_10673 is OK
+ ===========================
+|      poprun topology      |
+|===========================|
+| hosts     | gc-poplar-02  |
+|-----------|---------------|
+| ILDs      |       0       |
+|-----------|---------------|
+| instances |       0       |
+|-----------|---------------|
+| replicas  | 0 | 1 | 2 | 3 |
+ ---------------------------
+23:48:29.163 3555537 POPRUN [D] Target options from environment: {}
+23:48:29.163 3555537 POPRUN [D] Target options from V-IPU partition: {"ipuLinkDomainSize":"4","ipuLinkConfiguration":"default","ipuLinkTopology":"mesh","gatewayMode":"true","instanceSize":"4"}
+23:48:29.207 3555537 POPRUN [D] Found 1 devices with 4 IPUs
+23:48:29.777 3555537 POPRUN [D] Attached to device 6
+23:48:29.777 3555537 POPRUN [I] Preparing parent device 6
+23:48:29.777 3555537 POPRUN [D] Device 6 ipuLinkDomainSize=64, ipuLinkConfiguration=Default, ipuLinkTopology=Mesh, gatewayMode=true, instanceSize=4
+23:48:33.631 3555537 POPRUN [D] Target options from Poplar device: {"ipuLinkDomainSize":"64","ipuLinkConfiguration":"default","ipuLinkTopology":"mesh","gatewayMode":"true","instanceSize":"4"}
+23:48:33.631 3555537 POPRUN [D] Using target options: {"ipuLinkDomainSize":"4","ipuLinkConfiguration":"default","ipuLinkTopology":"mesh","gatewayMode":"true","instanceSize":"4"}
+
+Expected output ends with this: +
Graph compilation: 100%|██████████| 100/100 [00:04<00:00][1,0]<stderr>:2023-08-22T23:49:40.103248Z PO:ENGINE   3556102.3556102 W: WARNING: The compile time engine option debug.branchRecordTile is set to "5887" when creating the Engine. (At compile time it was set to 1471)
+[1,0]<stderr>:
+Loss:6.7539 [1,0]<stdout>:[INFO] Epoch 1████▌| 75/78 [02:42<00:06,  2.05s/it][1,0]<stderr>:
+[1,0]<stdout>:[INFO] loss: 6.7462,
+[1,0]<stdout>:[INFO] accuracy: 0.62 %
+[1,0]<stdout>:[INFO] throughput: 7599.7 samples/sec
+[1,0]<stdout>:[INFO] Epoch 2/2
+Loss:6.7462 | Accuracy:0.62%: 100%|██████████| 78/78 [02:48<00:00,  2.16s/it][1,0]<stderr>:
+Loss:6.2821 | Accuracy:2.42%:  96%|█████████▌| 75/7[1,0]<stdout>:[INFO] Epoch 2,0]<stderr>:
+[1,0]<stdout>:[INFO] loss: 6.2720,
+[1,0]<stdout>:[INFO] accuracy: 2.48 %
+[1,0]<stdout>:[INFO] throughput: 8125.8 samples/sec
+[1,0]<stdout>:[INFO] Finished training. Time: 2023-08-22 23:54:57.853508. It took: 0:05:26.090631
+Loss:6.2720 | Accuracy:2.48%: 100%|██████████| 78/78 [02:37<00:00,  2.02s/it][1,0]<stderr>:
+[1,0]<stderr>:/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown
+[1,0]<stderr>:  warnings.warn('resource_tracker: There appear to be %d '
+23:55:02.722 3555537 POPRUN [I] mpirun (PID 3556098) terminated with exit code 0
+

+

GPT-2 PyTorch - POD16 run

+

The scripts to train a GPT-2 pytorch model on the POD16 are located at https://github.com/graphcore/examples/tree/master/nlp/gpt2/pytorch

+

In order to run the GPT-2 Pytorch model, create a new popTorch virtual environment poptorch33_gpt2 as described in the virtual environment section and activate it.

+
source ~/venvs/graphcore/poptorch33_gpt2/bin/activate
+
+

Install Requirements

+

Change directory: +

cd ~/graphcore/examples/nlp/gpt2/pytorch
+pip3 install -r requirements.txt
+

+

Run GPT2 on 16 IPUs

+

The command for the GPT2 model is as follows is as follows. +

/opt/slurm/bin/srun --ipus=16 python /home/$USER/graphcore/examples/nlp/gpt2/pytorch/train_gpt2.py --model gpt2 --ipus-per-replica 4 --replication-factor 4 --gradient-accumulation 2048 --device-iterations 8 --batch-size 1 --layers-per-ipu 0 4 4 4 --matmul-proportion 0.15 0.15 0.15 0.15 --max-len 1024 --optimizer AdamW --learning-rate 0.00015 --lr-schedule cosine --lr-warmup 0.01 --remap-logit True --enable-sequence-serialized True --embedding-serialization-factor 4 --recompute-checkpoint-every-layer True --enable-half-partials True --replicated-tensor-sharding True --dataset 'generated' --epochs 1
+
+It runs a gpt2 model that fits on 4 IPUS indicated by --ipus-per-replica. The --replication-factor indicates how many times the model is replicated in a data parallel manner (4 in the above example). Hence the total number of IPUs used in this example is 16.

+

The effective global batch size in this example is (micro)batch-size * gradient-accumulation * replication-factor = 1 x 2048 x 4 = 8192. The device iterations indicates the total number samples loaded in 1 training step = global batch size * device iterations = 8192*8 = 65536. To learn more about these parameters and in general batching of IPUs refer IPU batching .

+

The above example is running with generated or synthetic data. To use the same example with a real world dataset, refer to data setup.

+

Output

+

Expected output starts with the following: +

srun: job 10697 queued and waiting for resources
+srun: job 10697 has been allocated resources
+Building (if necessary) and loading remap_tensor_ce.
+Failed to find compiled extension; rebuilding.
+Building (if necessary) and loading residual_add_inplace_pattern.
+Model initializing
+-------------------- Device Allocation --------------------
+Embedding  --> IPU 0
+Layer 0  --> IPU 1
+Layer 1  --> IPU 1
+Layer 2  --> IPU 1
+Layer 3  --> IPU 1
+Layer 4  --> IPU 2
+Layer 5  --> IPU 2
+Layer 6  --> IPU 2
+Layer 7  --> IPU 2
+Layer 8  --> IPU 3
+Layer 9  --> IPU 3
+Layer 10 --> IPU 3
+Layer 11 --> IPU 3
+LM_head --> IPU 0
+
+Expected output ends with the following: +
step 0 of epoch 0, loss: 10.913220405578613, acc: 2.0071864128112793e-05, lr: 0.00012803300858899104, throughput: 646.8439205981404 samples/sec
+step 1 of epoch 0, loss: 10.836345672607422, acc: 1.9788742065429688e-05, lr: 7.5e-05, throughput: 1058.0979097185766 samples/sec
+step 2 of epoch 0, loss: 10.831247329711914, acc: 2.0518898963928223e-05, lr: 2.1966991411008938e-05, throughput: 1058.7595523807183 samples/sec
+step 3 of epoch 0, loss: 10.829034805297852, acc: 1.990795135498047e-05, lr: 0.0, throughput: 1059.6762623043378 samples/sec
+

+
+

Note: The graph compilation for a large model like GPT-2 takes about half an hour.

+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/files/Bow.jpg b/ai-testbed/graphcore/files/Bow.jpg new file mode 100644 index 0000000000..e1aecc446a Binary files /dev/null and b/ai-testbed/graphcore/files/Bow.jpg differ diff --git a/ai-testbed/graphcore/files/Poplar_sdk.png b/ai-testbed/graphcore/files/Poplar_sdk.png new file mode 100644 index 0000000000..602e77851f Binary files /dev/null and b/ai-testbed/graphcore/files/Poplar_sdk.png differ diff --git a/ai-testbed/graphcore/files/ResNet50_throughput.ods b/ai-testbed/graphcore/files/ResNet50_throughput.ods new file mode 100644 index 0000000000..35927fdcf6 Binary files /dev/null and b/ai-testbed/graphcore/files/ResNet50_throughput.ods differ diff --git a/ai-testbed/graphcore/files/graphcore_login.png b/ai-testbed/graphcore/files/graphcore_login.png new file mode 100644 index 0000000000..d29c10df77 Binary files /dev/null and b/ai-testbed/graphcore/files/graphcore_login.png differ diff --git a/ai-testbed/graphcore/files/poptorch.sh b/ai-testbed/graphcore/files/poptorch.sh new file mode 100644 index 0000000000..556ba6a9bf --- /dev/null +++ b/ai-testbed/graphcore/files/poptorch.sh @@ -0,0 +1,27 @@ +rm -rf ~/graphcore +mkdir ~/graphcore +cd ~/graphcore +git clone https://github.com/graphcore/examples.git +#####This enables poplar and popart. It is automatically ran at log in. +#####source /software/graphcore/poplar_sdk/3.1.0/enable +#mkdir -p ~/venvs/graphcore +rm -rf ~/venvs/graphcore/poptorch31_rn50_env +virtualenv ~/venvs/graphcore/poptorch31_rn50_env +source ~/venvs/graphcore/poptorch31_rn50_env/bin/activate +POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.1.0 +export POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT +pip install $POPLAR_SDK_ROOT/poptorch-3.1.0+98660_0a383de63f_ubuntu_20_04-cp38-cp38-linux_x86_64.whl +#mkdir ${HOME}/tmp +export TF_POPLAR_FLAGS=--executable_cache_path=${HOME}/tmp +export POPTORCH_CACHE_DIR=${HOME}/tmp +export POPART_LOG_LEVEL=WARN +export POPLAR_LOG_LEVEL=WARN +export POPLIBS_LOG_LEVEL=WARN +export PYTHONPATH=/software/graphcore/poplar_sdk/3.1.0/poplar-ubuntu_20_04-3.1.0+6824-9c103dc348/python:$PYTHONPATH + +# if desired: +cd ${HOME}/graphcore/examples/vision/cnns/pytorch/ +make install +make install-turbojpeg +cd train +./rn50_pod16.sh diff --git a/ai-testbed/graphcore/getting-started/index.html b/ai-testbed/graphcore/getting-started/index.html new file mode 100644 index 0000000000..71d01fefa2 --- /dev/null +++ b/ai-testbed/graphcore/getting-started/index.html @@ -0,0 +1,6799 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Getting Started - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Getting Started

+

Connection to a Graphcore node is a two-step process.

+

The first step is to ssh from a local machine to the login node.

+

The second step is to log in to a Graphcore node from the login node.

+

Graphcore System View

+

Log in to Login Node

+

Login to the Graphcore login node from your local machine using the below command. This uses the ALCF account ID that uses the password generated from the MobilePASS+.

+
+

Note: In the examples below, replace ALCFUserID with your ALCF user id.

+
+
ssh ALCFUserID@gc-login-01.ai.alcf.anl.gov
+# or
+ssh ALCFUserID@gc-login-02.ai.alcf.anl.gov
+
+
+

Note: Use the ssh "-v" option in order to debug any ssh problems.

+
+

Log in to a Graphcore Node

+

Once you are on the login node, ssh to one of the Graphcore nodes.

+
ssh gc-poplar-02.ai.alcf.anl.gov
+# or
+ssh gc-poplar-03.ai.alcf.anl.gov
+# or
+ssh gc-poplar-04.ai.alcf.anl.gov
+
+
+

**Note: ssh gc-poplar-01.ai.alcf.anl.gov is not accessible to users. However, its IPU resources are assigned by the slurm tasks.

+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/job-queuing-and-submission/index.html b/ai-testbed/graphcore/job-queuing-and-submission/index.html new file mode 100644 index 0000000000..5e868da8cf --- /dev/null +++ b/ai-testbed/graphcore/job-queuing-and-submission/index.html @@ -0,0 +1,6871 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Job Queuing and Submission - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Job Queueing and Submission

+

Introduction

+

ALCF's Graphcore POD64 system uses Slurm for job submission and queueing. Below are some of the important commands for using Slurm. For more information refer to Slurm Documentation.

+
+

NOTE: Jobs that require IPUs will fail unless launched with srun or sbatch. +NOTE: There is a single Slurm scheduler for the Graphcore POD64.

+
+

SRun

+

The Slurm command srun can be used to run individual Python scripts (or other programs) in parallel with other scripts on a cluster managed by Slurm. An example of srun usage is shown below. Use the --ipus= option to specify the number of IPUs required for the run.

+

Example:

+
srun --ipus=1 python mnist_poptorch.py
+
+

SBatch

+

Alternatively, these jobs can be submitted to the Slurm workload manager through a batch script by using the sbatch command. To do this, create a bash script (submit-mnist-poptorch-job.sh here as an example) with the commands that you want to execute.

+
#!/bin/sh
+
+python mnist_poptorch.py
+
+

Then pass the bash script as an input to the sbatch command as shown below, requesting the number of IPUs required:

+
sbatch --ipus=1 --output=mnist-poptorch-output.log submit-mnist-poptorch-job.sh
+
+ + +

SQueue

+

The squeue command provides information about jobs located in the Slurm scheduling queue.

+
$ squeue
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+              2572       p64 Graphcor username  R       1:12      1 gc-poplar-02
+
+

SInfo

+

SInfo is used to view partition and node information for a system running Slurm.

+
$ sinfo
+PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
+p64*         up   infinite      3   idle gc-poplar-[02-04]
+
+

For more information, see SInfo.

+

SCancel

+

SCancel is used to signal or cancel jobs, job arrays, or job steps.

+
scancel job_id
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/miscellaneous/index.html b/ai-testbed/graphcore/miscellaneous/index.html new file mode 100644 index 0000000000..3e36fa9d98 --- /dev/null +++ b/ai-testbed/graphcore/miscellaneous/index.html @@ -0,0 +1,6850 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Miscellaneous - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Miscellaneous

+

Status

+

GC-Monitor

+

The command gc-monitor is Graphcore's device usage monitor. Run it as follows for ordinary monitoring. See gc-monitor --help for other options.

+

export IPUOF_VIPU_API_HOST=10.1.3.101
+gc-monitor --no-card-info --all-partitions
+# or watch gc-monitor --no-card-info --all-partitions
+
+The IPUOF_VIPU_API_HOST environment variable can conflict with the running of poptorch programs. +The graphcore nodes have a convenience script that temporarily sets this environment variable. +
wrapped_gc_monitor.sh --no-card-info --all-partitions
+

+
+

Note: if there are no partitions active, gc-monitor will core dump: Segmentation fault (core dumped)

+
+

The output will look something like:

+
+--------------------------------------------------------------+-----------------------+
+|      IPUs in slurm_2616 attached from other namespaces       |         Board         |
++----+------------------------------+--------------+-----------+-----------+-----------+
+| ID |       Application host       |    Clock     |   Temp    |   Temp    |   Power   |
++----+------------------------------+--------------+-----------+-----------+-----------+
+| 0  |         gc-poplar-02         |   1850MHz    |  24.2 C   |  21.1 C   |  92.3 W   |
++----+------------------------------+--------------+-----------+-----------+-----------+
+
+

GC-Info

+

The command gc-info is used to display device information. See gc-info --help for more options.

+

To list devices, +

gc-info -l
+

+

The command gc-info lists the partition and different IPU Id's along with the multi-IPU configuration IDs.

+
-+- Id:  [0], target: [Fabric], IPU-M host:  [10.1.5.1], IPU#: [3]
+-+- Id:  [1], target: [Fabric], IPU-M host:  [10.1.5.1], IPU#: [2]
+-+- Id:  [2], target: [Fabric], IPU-M host:  [10.1.5.1], IPU#: [1]
+
+

One may also display detailed information for a specific device. The devices are numbered 0-63. For example,

+
gc-info --device-id 0 --device-info
+
+

See gc-info --help for more information.

+

How busy is the system?

+

Use one of

+
top
+htop
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/running-a-model-or-program/index.html b/ai-testbed/graphcore/running-a-model-or-program/index.html new file mode 100644 index 0000000000..ab930feb5a --- /dev/null +++ b/ai-testbed/graphcore/running-a-model-or-program/index.html @@ -0,0 +1,6922 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Running a Model/Program - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Steps to Run a Model/Program

+
+

Note: Please be mindful of how you are using the system. +For example, consider running larger jobs in the evening or on weekends.

+
+

Running of any model or application includes graph compilation of the model that is then deployed on the IPUs. Below is the description of training a neural network for classification on the MNIST dataset using the PopTorch (pytorch framework optimized for IPU).

+

Examples Repo

+

Graphcore provides examples of some well-known AI applications in their repository at https://github.com/graphcore/examples.git.

+

Clone the examples repository to your personal directory structure, and checkout the v3.3.0 release:

+
mkdir ~/graphcore
+cd ~/graphcore
+git clone https://github.com/graphcore/examples.git
+cd examples
+
+

MNIST

+

Activate PopTorch Environment

+

Follows the steps at Poptorch environment setup to enable the Poplar SDK.

+
source ~/venvs/graphcore/poptorch33_env/bin/activate
+
+

Install Requirements

+

Change directory and install packages specific to the MNIST model:

+
cd ~/graphcore/examples/tutorials/simple_applications/pytorch/mnist
+
+

Run MNIST

+

Execute the command:

+
/opt/slurm/bin/srun --ipus=1 python mnist_poptorch.py
+
+

All models are run using Slurm, with the --ipus indicating how many IPUs are need to be allocated for the model being run. This example uses a batchsize of 8, and run for 10 epochs. It also set the device iteration to 50 which is the number of iterations the device should run over the data before returning to the user. The dataset used in the example is derived from the TorchVision and the PopTorch dataloader is used to load the data required for the 50 device iterations from the host to the device in a single step.

+

The model used here is a simple CNN based model with an output from a classifier (softmax layer). +A simple Pytorch model is translated to a PopTorch model using poptorch.Options(). +poptorch.trainingModel is the model wrapping function on the Pytorch model. The first call to trainingModel will compile the model for the IPU. You can observe the compilation process as part of output of the above command.

+
Graph compilation:   3%|▎         | 3/100 [00:00<00:03]2023-04-26T16:53:21.225944Z PL:POPLIN    3680893.3680893 W: poplin::preplanMatMuls() is deprecated! Use poplin::preplan() instead
+Graph compilation: 100%|██████████| 100/100 [00:20<00:00]2023-04-26T16:53:38.241395Z popart:session 3680893.3680893
+
+

The artifacts from the graph compilations is cached in the location set by the flag POPTORCH_CACHE_DIR, where the .popef file corresponding to the model under consideration is cached.

+

Output

+

The expected output will start with downloads followed by and we can observe the model used by the model, the progress bar of the compilation process, and the training progress bar.

+
srun: job 10671 queued and waiting for resources
+srun: job 10671 has been allocated resources
+TrainingModelWithLoss(
+  (model): Network(
+    (layer1): Block(
+      (conv): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
+      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+      (relu): ReLU()
+    )
+    (layer2): Block(
+      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
+      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+      (relu): ReLU()
+    )
+    (layer3): Linear(in_features=1600, out_features=128, bias=True)
+    (layer3_act): ReLU()
+    (layer3_dropout): Dropout(p=0.5, inplace=False)
+    (layer4): Linear(in_features=128, out_features=10, bias=True)
+    (softmax): Softmax(dim=1)
+  )
+  (loss): CrossEntropyLoss()
+)
+Epochs:   0%|          | 0/10 [00:00<?,[23:27:06.753] [poptorch:cpp] [warning] [DISPATCHER] Type coerced from Long to Int for tensor id 10
+Graph compilation: 100%|██████████| 100/100 [00:00<00:00]
+Epochs: 100%|██████████| 10/10 [01:17<00:00,  7.71s/it]
+Graph compilation: 100%|██████████| 100/100 [00:00<00:00]                          
+Accuracy on test set: 96.85%██████| 100/100 [00:00<00:00]
+
+

Refer to the script to learn more about this example.

+

Example Programs lists the different example applications with corresponding commands for each of the above steps.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/system-overview/index.html b/ai-testbed/graphcore/system-overview/index.html new file mode 100644 index 0000000000..aed970d91e --- /dev/null +++ b/ai-testbed/graphcore/system-overview/index.html @@ -0,0 +1,6720 @@ + + + + + + + + + + + + + + + + + + + + + + + + + System Overview - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

System Overview

+ + +

The Graphcore Bow-Pod64 system is the latest-generation AI accelerator from Graphcore. This is a one-rack system consisting of 64 Bow-class Intelligence Processing Units (IPU) with a custom interconnect. The system provides for an aggregate 22 Petaflops/s of performance in half precision. It has a total of 57.6 GB In-Processor-Memory with a total of 94,208 IPU cores. The system consists of four servers for data-processing.

+

For more details refer to the POD64 spec

+

Poplar SDK +(Figure from +https://www.graphcore.ai/products/poplar)

+

The Graphcore software stack includes support for TensorFlow and PyTorch using the Poplar SDK. The Poplar® SDK is t is the toolchain specifically designed for creating graph software for ML applications. It integrates with the traditional ML frameworks like PyTorch and TensorFlow allowing users to port their existing code to the IPU hardware-specific code. The various components of the poplar SDK stack are shown in the figure. It includes the PopTorch framework which is a wrapper over the PyTorch framework optimized to the IPU hardware. It also enlists the different PopLibs libraries supported, which enables to construct graphs, define tensor data and control how the code and data are mapped onto the IPU for execution.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/unused/Scaling-ResNet50/index.html b/ai-testbed/graphcore/unused/Scaling-ResNet50/index.html new file mode 100644 index 0000000000..9f19edb598 --- /dev/null +++ b/ai-testbed/graphcore/unused/Scaling-ResNet50/index.html @@ -0,0 +1,7077 @@ + + + + + + + + + + + + + + + + + + + + + Scaling ResNet50 - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Scaling ResNet50

+

Follow all the instructions in Getting Started to log into a Graphcore node.

+

Examples Repo

+

Graphcore provides examples of some well-known AI applications in their repository at https://github.com/graphcore/examples.git.

+

Clone the examples repository to your personal directory structure:

+
mkdir ~/graphcore
+cd ~/graphcore
+git clone https://github.com/graphcore/examples.git
+
+

Environment Setup

+

Establish a virtual environment.

+
mkdir -p ~/venvs/graphcore
+rm -rf ~/venvs/graphcore/poptorch31_rn50_env
+virtualenv ~/venvs/graphcore/poptorch31_rn50_env
+source ~/venvs/graphcore/poptorch31_rn50_env/bin/activate
+
+

Install PopTorch

+

Install PopTorch.

+
POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.1.0
+export POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT
+pip install $POPLAR_SDK_ROOT/poptorch-3.1.0+98660_0a383de63f_ubuntu_20_04-cp38-cp38-linux_x86_64.whl
+
+

Environment Variables

+

Establish the following environment variables.

+
mkdir ${HOME}/tmp
+export TF_POPLAR_FLAGS=--executable_cache_path=${HOME}/tmp
+export POPTORCH_CACHE_DIR=${HOME}/tmp
+export POPART_LOG_LEVEL=WARN
+export POPLAR_LOG_LEVEL=WARN
+export POPLIBS_LOG_LEVEL=WARN
+export PYTHONPATH=/software/graphcore/poplar_sdk/3.1.0/poplar-ubuntu_20_04-3.1.0+6824-9c103dc348/python:$PYTHONPATH
+
+

Install Requirements

+
cd ${HOME}/graphcore/examples/vision/cnns/pytorch/
+make install
+make install-turbojpeg
+
+

One-time per user ssh key set up

+

Set up the ssh key on gc-poplar-01.

+

Gc-poplar-01

+

On gc-poplar-01:

+
mkdir ~/.ssh
+cd ~/.ssh
+ssh-keygen -t rsa -b 4096
+#Accecpt default filename of id_rsa
+#Enter passphrase (empty for no passphrase):
+#Enter same passphrase again:
+cat id_rsa.pub >> authorized_keys
+
+
ssh-keyscan -H gc-poplar-01 >> ~/.ssh/known_hosts
+
+

You should see:

+
# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+
+
ssh-keyscan -H gc-poplar-02 >> ~/.ssh/known_hosts
+
+

You should see:

+
# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+
+
ssh-keyscan -H gc-poplar-03 >> ~/.ssh/known_hosts
+
+

You should see:

+
# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+
+
ssh-keyscan -H gc-poplar-04 >> ~/.ssh/known_hosts
+
+

You should see:

+
# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
+
+

benchmarks.yml

+

Update ${HOME}/graphcore/examples/vision/cnns/pytorch/train/benchmarks.yml +with your favorite editor to match benchmarks.yml.

+

configs.yml

+

Update ${HOME}/graphcore/examples/vision/cnns/pytorch/train/configs.yml +with your favorite editor. At about line 30, change use_bbox_info: true to +use_bbox_info: false.

+

Scale ResNet50

+

Scale and benchmark ResNet50.

+
+

Note: The number at the end of each line indicates the number of IPUs.

+

Note: Use screen because every run is long.

+
+

"PopRun exposes this control with the --process-placement flag and provides multiple pre-defined strategies. By default (and with --process-placement spreadnuma), PopRun is designed to be NUMA-aware. On each host, all the available NUMA nodes are divided among the instances. This means that each instance is bound to execute on and allocate memory from its assigned NUMA nodes, ensuring memory access locality. This strategy maximises memory bandwidth and is likely to yield optimal performance for most of the data loading workloads in machine learning." [Multi-Instance Multi-Host(https://docs.graphcore.ai/projects/poprun-user-guide/en/latest/launching.html#multi-instance-multi-host)

+

Setup

+

Move to the correct directory and establish the datasets directory.

+
cd ${HOME}/graphcore/examples/vision/cnns/pytorch/train
+export DATASETS_DIR=/mnt/localdata/datasets/
+
+

Scaling to 16 IPUs

+

One may use any of the following commands to run ResNet50 on one to sixteen IPUs.

+
python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_1
+python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_2
+python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_4
+python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_8
+python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod16
+
+

Scaling to 64 IPUs

+
+

Note: One must complete the instructions on Multi-node Setup before running this example.

+
+

Establish Environment Variables

+
HOST1=`ifconfig eno1 | grep "inet " | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' | head -1`
+OCT123=`echo "$HOST1" | cut -d "." -f 1,2,3`
+OCT4=`echo "$HOST1" | cut -d "." -f 4`
+HOST2=$OCT123.`expr $OCT4 + 1`
+HOST3=$OCT123.`expr $OCT4 + 2`
+HOST4=$OCT123.`expr $OCT4 + 3`
+export HOSTS=$HOST1,$HOST2,$HOST3,$HOST4
+export CLUSTER=c16
+export IPUOF_VIPU_API_PARTITION_ID=p64
+export TCP_IF_INCLUDE=$OCT123.0/8
+export IPUOF_VIPU_API_HOST=$HOST1
+
+

64 IPU Run

+

This runs to convergence. It uses all 64 IPUs for more than 12 hours.

+
+

Note: This should only be used if absolutely required.

+
+

Execute:

+
python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod64
+python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod64_conv
+
+

Benchmark Results

+

One IPU

+
[INFO] 2022-12-16 17:07:32: Total runtime: 3956.836479 seconds
+[INFO] 2022-12-16 17:07:32:    throughput = '7527.626315789474'
+[INFO] 2022-12-16 17:07:32:    accuracy = '57.41'
+[INFO] 2022-12-16 17:07:32:    loss = '2.8153'
+[INFO] 2022-12-16 17:07:33:    Total compile time: 429.59 seconds
+
+

Two IPUs

+
[INFO] 2022-12-16 15:56:23: Total runtime: 5866.494071 seconds
+[INFO] 2022-12-16 15:56:23:    throughput = '4798.778947368421'
+[INFO] 2022-12-16 15:56:23:    accuracy = '68.23'
+[INFO] 2022-12-16 15:56:23:    loss = '2.3148'
+[INFO] 2022-12-16 15:56:24:    Total compile time: 418.75 seconds
+
+

Four IPUs

+
[INFO] 2022-12-16 04:05:28: Total runtime: 3070.994553 seconds
+[INFO] 2022-12-16 04:05:28:    throughput = '9959.821052631578'
+[INFO] 2022-12-16 04:05:28:    accuracy = '67.76'
+[INFO] 2022-12-16 04:05:28:    loss = '2.338'
+[INFO] 2022-12-16 04:05:29:    Total compile time: 377.4 seconds
+
+

Eight IPUs

+
[INFO] 2022-12-16 02:46:45: Total runtime: 1831.437598 seconds
+[INFO] 2022-12-16 02:46:45:    throughput = '19865.263157894733'
+[INFO] 2022-12-16 02:46:45:    accuracy = '64.94'
+[INFO] 2022-12-16 02:46:45:    loss = '2.4649'
+[INFO] 2022-12-16 02:46:46:    Total compile time: 386.27 seconds
+
+

Sixteen IPUs

+

Epochs: 20

+
[INFO] 2022-12-15 22:01:14: Total runtime: 1297.274336 seconds
+[INFO] 2022-62:01:14:    throughput = '39057.447368421046'
+[INFO] 2022-12-15 22:01:14:    accuracy = '57.43'
+[INFO] 2022-12-15 22:01:14:    loss = '2.8162'
+[INFO] 2022-12-15 22:01:16:    Total compile time: 397.08 seconds
+
+

Sixty-Four IPUs

+
[1,0]<stdout>:[INFO] loss: 4.8367,
+[1,0]<stdout>:[INFO] accuracy: 18.83 %
+[1,0]<stdout>:[INFO] throughput: 51368.5 samples/sec
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/unused/cosmictagger-conversion/index.html b/ai-testbed/graphcore/unused/cosmictagger-conversion/index.html new file mode 100644 index 0000000000..4b608956ef --- /dev/null +++ b/ai-testbed/graphcore/unused/cosmictagger-conversion/index.html @@ -0,0 +1,7093 @@ + + + + + + + + + + + + + + + + + + + + + CosmicTagger Conversion - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

CosmicTagger Conversion

+

The intent of this page is to show conceptually how to convert a model to run on the Graphcore system. +It is not necessary to convert CosmicTagger because it has already been converted and is +located at CosmicTagger on the Graphcore branch. +The original is located at CosmicTagger.

+

Run Model on CPU

+

The first step to converting a model is to verify that it runs on the CPU. This step has been verified for CosmicTagger.

+

Config.py

+

CosmicTagger can run on multiple machines. As such, it is necessary to specify the architecture +that one is using. For example, CPU or GPU. The architecture is stored in the +ComputeMode class.

+

Edit src/config/config.py. Add IPU to the ComputeMode class.

+
class ComputeMode(Enum):
+    CPU   = 0
+    #...
+    IPU   = 5
+
+

Trainer.py

+

Edit src/utils/torch/trainer.py.

+

Import PopTorch

+

PopTorch is Graphcore's extension of PyTorch.

+

Import poptorch at the top of the file.

+
import poptorch
+
+

Wrap Model

+

Wrap the model using poptorch.trainingModel() so that it may be ran on IPUs for training.

+

Wrap the model using poptorch.inferenceModel() when not training.

+

Find the following code around line 90 in the init_network method.

+
        # Foregoing any fusions as to not disturb the existing ingestion pipeline
+        if self.is_training() and self.args.mode.quantization_aware:
+            self._raw_net.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
+            self._net = torch.quantization.prepare_qat(self._raw_net)
+        else:
+            self._net = self._raw_net
+
+

After the above code, add:

+
        if self.args.run.compute_mode == ComputeMode.IPU:
+            if self.is_training():
+                opts = poptorch.Options()
+                self._net = poptorch.trainingModel(self._net, opts, optimizer=torch.optim.SGD(self._net.parameters(), lr=1e-3))
+            else:
+                self._net = poptorch.inferenceModel(self._net)
+
+

See poptorch.trainingModel() and poptorch.inferenceModel() for more information.

+

There is also a Build the Model tutorial.

+

Update Optimizer

+

Update init_optimizer() to use the poptorch class instead of the torch class as needed.

+

Change:

+
        if self.args.mode.optimizer.name == OptimizerKind.rmsprop:
+            self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)
+        else:
+            self._opt = torch.optim.Adam(self._net.parameters(), 1.0)
+
+

to:

+
        if self.args.mode.optimizer.name == OptimizerKind.rmsprop:
+            if self.args.run.compute_mode == ComputeMode.IPU:
+                self._opt = poptorch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)
+            else:
+                self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)
+        else:
+            if self.args.run.compute_mode == ComputeMode.IPU:
+                self._opt = poptorch.optim.Adam(self._net.parameters(), 1.0)
+            else:
+                self._opt = torch.optim.Adam(self._net.parameters(), 1.0)
+
+

Update the Forward Pass

+

Putting the loss calculation in forward_pass() allows the loss computation to be performed on the IPUs. +This will be faster because the data will not need to be transfered round-trip to the CPU.

+

Change forward_pass():

+

Original

+
            if net is None:
+                logits_image = self._net(minibatch_data['image'])
+            else:
+                logits_image = net(minibatch_data['image'])
+
+

Updated

+

The following code changes are to account for the loss function, i.e., self.loss_calculator, and the +image labels, i.e., labels_image, to be passed to the model's forward_pass method. Additionally, the calculated +loss is returned from the forward_pass method.

+
            if net is None:
+                if self.args.run.compute_mode == ComputeMode.IPU:
+                    logits_image, labels_image, loss = self._net(minibatch_data['image'], self.loss_calculator, labels_image)
+                    return logits_image, labels_image, loss
+                else:
+                    logits_image = self._net(minibatch_data['image'])
+            else:
+                if self.args.run.compute_mode == ComputeMode.IPU and self.args.mode.name != ModeKind.inference:
+                    logits_image, labels_image, loss = net(minibatch_data['image'], self.loss_calculator, labels_image)
+                    return logits_image, labels_image, loss
+                else:
+                    logits_image = net(minibatch_data['image'])
+
+

Update the Training Step

+

Receive the extra loss variable from the forward_pass method.

+

Update the train_step method.

+

Original Training Step

+
                    with self.timing_context("forward"):
+                        if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
+                            with torch.cuda.amp.autocast():
+                                logits_image, labels_image = self.forward_pass(minibatch_data)
+                        else:
+                            logits_image, labels_image = self.forward_pass(minibatch_data)
+
+                    verbose = False
+
+                    # Compute the loss based on the logits
+                    with self.timing_context("loss"):
+                        loss = self.loss_calculator(labels_image, logits_image)
+
+

Updated Training Step

+

The forward_pass() method was changed to return the extra variable loss in the previous section. It is now +received conditionally when using an IPU(s).

+

In the with self.timing_context("loss"): section, only calculate loss if not using an IPU(s).

+
                    with self.timing_context("forward"):
+                        if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
+                            with torch.cuda.amp.autocast():
+                                logits_image, labels_image = self.forward_pass(minibatch_data)
+                        else:
+                            if self.args.run.compute_mode == ComputeMode.IPU:
+                                logits_image, labels_image, loss = self.forward_pass(minibatch_data)
+                            else:
+                                logits_image, labels_image = self.forward_pass(minibatch_data)
+
+                    verbose = False
+
+
+                    # Compute the loss based on the logits
+                    with self.timing_context("loss"):
+                        if self.args.run.compute_mode == ComputeMode.IPU:
+                            loss = loss
+                        else:
+                            loss = self.loss_calculator(labels_image, logits_image)
+
+

Update Validation Step

+

Update the val_step method.

+

Original Validation Step Code

+

Find this code.

+
            if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
+                with torch.cuda.amp.autocast():
+                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
+            else:
+                logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
+
+            # Compute the loss based on the logits
+            loss = self.loss_calculator(labels_image, logits_image)
+
+

Updated Validation Step Code

+

Change the code to the following.

+
            if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
+                with torch.cuda.amp.autocast():
+                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
+
+                    # Compute the loss based on the logits
+                    loss = self.loss_calculator(labels_image, logits_image)
+            else:
+                if self.args.run.compute_mode == ComputeMode.IPU:
+                    logits_image, labels_image, loss = self.forward_pass(minibatch_data, net=val_net)
+                else:
+                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
+
+                    # Compute the loss based on the logits
+                    loss = self.loss_calculator(labels_image, logits_image)
+
+

UResNet2D Model

+

Update Model

+

The Graphcore system is more computationally efficient if the loss function is on the +IPU. This is accomplished by using the loss function within the model's forward method.

+

Edit src/networks/torch/uresnet2D.py.

+

Update the Forward Declaration

+

Find the forward method.

+
def forward(self, input_tensor):
+
+

Update the argument list to include the loss function, i.e., loss_calculator +and the image labels, i.e., labels_image.

+
def forward(self, input_tensor, loss_calculator=None, labels_image=None):
+
+

Add Loss Calculation

+

Add the loss calculation just before the forward method returns.

+
        if loss_calculator is not None:
+
+            labels_image = labels_image.long()
+            labels_image = torch.chunk(labels_image, chunks=3, dim=1)
+            shape =  labels_image[0].shape
+            labels_image = [ _label.view([shape[0], shape[-2], shape[-1]]) for _label in labels_image ]
+
+            loss = loss_calculator(labels_image, x)
+            import poptorch
+            loss = poptorch.identity_loss(loss , reduction="mean")
+            return x, labels_image, loss
+
+        # This return already exists.
+        return x
+
+

The poptorch.identity_loss method takes a single PyTorch tensor and will backpropagate a gradient of ones through it. You may find an example at here

+

bin/exec.py

+

The following is included for completeness. One will not likely find this in other code.

+

Open bin/exec.py in your favorite editor. Change:

+
@hydra.main(version_base=None, config_path="../src/config", config_name="config")
+
+

to

+
@hydra.main(config_path="../src/config", config_name="config")
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/unused/cosmictagger-ddp/index.html b/ai-testbed/graphcore/unused/cosmictagger-ddp/index.html new file mode 100644 index 0000000000..67b63b6c06 --- /dev/null +++ b/ai-testbed/graphcore/unused/cosmictagger-ddp/index.html @@ -0,0 +1,6847 @@ + + + + + + + + + + + + + + + + + + + + + CosmicTagger Conversion - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

CosmicTagger Conversion

+

The intent of this page is to show conceptually how to convert a Graphcore model to run on Distributed Data Parallel +using PopDist. +It is not necessary to convert CosmicTagger because it has already been converted and is +located at CosmicTagger on the GraphcoreDDP branch. +The original is located at CosmicTagger.

+

Run Model on CPU

+

The first step to converting a model is to verify that it runs on the CPU. This step has been verified for CosmicTagger.

+

Starter Code

+

You may use the code at CosmicTagger on the Graphcore branch.

+

Trainer.py

+

Edit src/utils/torch/trainer.py.

+

Import Poplar Packages

+

PopTorch is Graphcore's extension of PyTorch.

+

PopDist is Graphcore's distributed processing package.

+

Import poptorch and popdist at the top of the file.

+
try:
+    import poptorch
+    import popdist
+    import popdist.poptorch
+except:
+    pass
+
+

Initialization

+

Initialize popdist for distributed computing.

+

Establish a class variable name instance. This is used to differentiate between different +model instances that will be saved.

+

Add the following line at the bottom of init().

+
        if self.args.run.compute_mode == ComputeMode.IPU and popdist.isPopdistEnvSet():
+            popdist.init()
+            self._instance = popdist.getInstanceIndex()
+        else:
+            self._instance = 0
+
+

Use Instance Variable

+

Use the instance variable for the model file name.

+

Find def get_model_filepath.

+

Change:

+
        name = file_path + 'model-{}.ckpt'.format(self._global_step)
+
+

To:

+
        name = file_path + f'model-{self._global_step}-{self._instance}.ckpt'
+
+

Establish Logging Method

+

Add a helper function to log data at the bottom of the file.

+
    def log_in_single_instance(self, string):
+        if self.args.run.compute_mode == ComputeMode.IPU:
+            if not popdist.isPopdistEnvSet() or popdist.getInstanceIndex() == 0:
+                logging.info(string)
+        else:
+            logging.info(string)
+
+

Update Init_network()

+

PopTorch has an Option() method which returns values that get passed to poptorch.trainingModel. +The returned values are stored in opts in this example.

+

Find:

+
        if self.args.run.compute_mode == ComputeMode.IPU:
+            if self.is_training():
+                opts = poptorch.Options()
+                self._net = poptorch.trainingModel(self._net, opts, optimizer=torch.optim.SGD(self._net.parameters(), lr=1e-3))
+            else:
+                self._net = poptorch.inferenceModel(self._net)
+
+

Replace it with:

+
        if self.args.run.compute_mode == ComputeMode.IPU:
+            if popdist.isPopdistEnvSet():
+                opts = popdist.poptorch.Options()
+                # When using the dataloader with 'auto_distributed_partitioning=True'
+                # and 'shuffle=True' we must set the random seed to ensure that tensors
+                # are in the same order in all processes.
+                opts.randomSeed(42)
+                # Replication factor is already set via PopRun so
+                # we ignore 'args.num_replicas'.
+                logging.info(f"Num of local replicas: {popdist.getNumLocalReplicas()}")
+            else:
+                opts = poptorch.Options()
+                opts.replicationFactor(self.args.num_replicas)
+
+            if self.is_training():
+                self._net = poptorch.trainingModel(self._net, opts, optimizer=torch.optim.SGD(self._net.parameters(), lr=1e-3))
+            else:
+                self._net = poptorch.inferenceModel(self._net)
+
+

Run The Code

+

See instructions in README_GRAPHCORE.md.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/unused/files/Graph_Ananlyser_main.jpg b/ai-testbed/graphcore/unused/files/Graph_Ananlyser_main.jpg new file mode 100644 index 0000000000..ed73da75d2 Binary files /dev/null and b/ai-testbed/graphcore/unused/files/Graph_Ananlyser_main.jpg differ diff --git a/ai-testbed/graphcore/unused/files/benchmarks.yml b/ai-testbed/graphcore/unused/files/benchmarks.yml new file mode 100644 index 0000000000..e12ee55a69 --- /dev/null +++ b/ai-testbed/graphcore/unused/files/benchmarks.yml @@ -0,0 +1,221 @@ +--- +common_options: &common_options + data: + throughput: + regexp: 'throughput: *(.*?) samples\/sec' + skip: 1 + accuracy: + reduction_type: "final" + regexp: 'accuracy: *(.*?) \%' + loss: + reduction_type: "final" + regexp: 'loss: *(\d*\.\d*)' + output: + - [samples/sec, "throughput"] + - [accuracy, "accuracy"] + - [loss, "loss"] + env: + POPLAR_ENGINE_OPTIONS: '{"opt.enableMultiAccessCopies":"false"}' + PYTORCH_CACHE_DIR: "./pt_cache/" + +config_options: &config_options + requirements_path: requirements.txt + required_apt_packages_path: required_apt_packages.txt + pre_run_commands: [make install, make install-turbojpeg] + +pytorch_resnet50_train_real_1: + <<: [*common_options, *config_options] + description: ResNet training on 1 Mk2 IPU, real data. + cmd: >- + python3 train.py + --config resnet50 + --imagenet-data-path $DATASETS_DIR/imagenet-raw-dataset + --epoch 20 + --validation-mode none + --dataloader-worker 14 + --dataloader-rebatch-size 256 + +pytorch_resnet50_train_real_2: + <<: [*common_options, *config_options] + description: ResNet training on 2 Mk2 IPUs, real data. + cmd: >- + poprun + -vv + --num-instances=2 + --num-replicas=2 + --executable-cache-path=$PYTORCH_CACHE_DIR + python3 train.py + --config resnet50 + --imagenet-data-path $DATASETS_DIR/imagenet-raw-dataset + --epoch 20 + --validation-mode none + --dataloader-worker 14 + --dataloader-rebatch-size 256 + +pytorch_resnet50_train_real_4: + <<: [*common_options, *config_options] + description: ResNet training on 4 Mk2 IPUs, real data. + cmd: >- + poprun + -vv + --num-instances=4 + --num-replicas=4 + --executable-cache-path=$PYTORCH_CACHE_DIR + python3 train.py + --config resnet50 + --imagenet-data-path $DATASETS_DIR/imagenet-raw-dataset + --epoch 20 + --validation-mode none + --dataloader-worker 14 + --dataloader-rebatch-size 256 + +pytorch_resnet50_train_real_8: + <<: [*common_options, *config_options] + description: ResNet training on 8 Mk2 IPUs, real data. + cmd: >- + poprun + -vv + --num-instances=8 + --num-replicas=8 + --executable-cache-path=$PYTORCH_CACHE_DIR + python3 train.py + --config resnet50 + --imagenet-data-path $DATASETS_DIR/imagenet-raw-dataset + --epoch 20 + --validation-mode none + --dataloader-worker 14 + --dataloader-rebatch-size 256 + +pytorch_resnet50_train_real_pod16: + <<: [*common_options, *config_options] + description: ResNet training on 16 Mk2 IPUs, real data. + cmd: >- + poprun + -vv + --num-instances=16 + --num-replicas=16 + --executable-cache-path=$PYTORCH_CACHE_DIR + python3 train.py + --config resnet50 + --imagenet-data-path $DATASETS_DIR/imagenet-raw-dataset + --epoch 20 + --validation-mode none + --dataloader-worker 14 + --dataloader-rebatch-size 256 + +pytorch_resnet50_train_real_pod64: + <<: [*common_options, *config_options] + description: | + ResNet training on 64 Mk2 IPUs with real data + for convergence testing. + cmd: >- + poprun + -vv + --num-instances=32 + --num-replicas=64 + --vipu-server-host=$IPUOF_VIPU_API_HOST + --host=$HOSTS + --vipu-server-port 8090 + --vipu-partition=$IPUOF_VIPU_API_PARTITION_ID + --vipu-cluster=$VIPU_CLUSTER_ID + --update-partition=yes + --remove-partition=yes + --reset-partition=no + --sync-type=ST_POD_NATIVE_DEFAULT + --executable-cache-path=$PYTORCH_CACHE_DIR + --mpi-global-args=" + --mca oob_tcp_if_include $TCP_IF_INCLUDE + --mca btl_tcp_if_include $TCP_IF_INCLUDE" + --mpi-local-args=" + -x LD_LIBRARY_PATH + -x OPAL_PREFIX + -x PATH + -x CPATH + -x PYTHONPATH + -x POPLAR_ENGINE_OPTIONS + -x IPUOF_VIPU_API_TIMEOUT=800" + python3 train.py + --config resnet50-pod64 + --dataloader-worker 14 + --dataloader-rebatch-size 256 + --imagenet-data-path $DATASETS_DIR/imagenet-raw-dataset + --epoch 20 + --validation-mode none + +pytorch_resnet50_train_real_pod64_conv: + <<: [*common_options, *config_options] + description: | + ResNet training on 64 Mk2 IPUs with real data + for convergence testing. + cmd: >- + poprun + -vv + --num-instances=32 + --num-replicas=64 + --vipu-server-host=$IPUOF_VIPU_API_HOST + --host=$HOSTS + --vipu-server-port 8090 + --vipu-partition=$IPUOF_VIPU_API_PARTITION_ID + --vipu-cluster=$VIPU_CLUSTER_ID + --update-partition=yes + --remove-partition=yes + --reset-partition=no + --sync-type=ST_POD_NATIVE_DEFAULT + --executable-cache-path=$PYTORCH_CACHE_DIR + --mpi-global-args=" + --mca oob_tcp_if_include $TCP_IF_INCLUDE + --mca btl_tcp_if_include $TCP_IF_INCLUDE" + --mpi-local-args=" + -x LD_LIBRARY_PATH + -x OPAL_PREFIX + -x PATH + -x CPATH + -x PYTHONPATH + -x POPLAR_ENGINE_OPTIONS + -x IPUOF_VIPU_API_TIMEOUT=800" + python3 train.py + --config resnet50-pod64 + --dataloader-worker 14 + --dataloader-rebatch-size 256 + --imagenet-data-path $DATASETS_DIR/imagenet-raw-dataset + --checkpoint-output-dir ./checkpoints + --wandb + --validation-mode none + +pytorch_efficientnet_b0_train_real_pod16: + <<: [*common_options, *config_options] + description: | + EfficientNet-B0-G16-GN training pipelined on 4 IPU-Ms (16 IPUs) + using real data + cmd: >- + poprun + -vv + --num-instances=4 + --num-replicas=8 + --ipus-per-replica=2 + python3 train.py + --config efficientnet-b0-g16-gn-pod16 + --imagenet-data-path $DATASETS_DIR/imagenet-raw-dataset + --epoch 2 + --warmup-epoch 0 + --validation-mode none + --weight-avg-strategy none + +pytorch_efficientnet_b4_g16_train_real_pod16: + <<: [*common_options, *config_options] + description: | + EfficientNet-B4-G16-GN training pipelined on 4 IPU-Ms (16 IPUs) + using real data + cmd: >- + poprun + -vv + --num-instances=2 + --num-replicas=8 + --ipus-per-replica=2 + python3 train.py + --config efficientnet-b4-g16-gn-pod16 + --imagenet-data-path $DATASETS_DIR/imagenet-raw-dataset + --epoch 2 + --warmup-epoch 0 + --validation-mode none + --weight-avg-strategy none diff --git a/ai-testbed/graphcore/unused/files/graphcore-sys-view.png b/ai-testbed/graphcore/unused/files/graphcore-sys-view.png new file mode 100644 index 0000000000..347c2a1797 Binary files /dev/null and b/ai-testbed/graphcore/unused/files/graphcore-sys-view.png differ diff --git a/ai-testbed/graphcore/unused/multi-node-setup/index.html b/ai-testbed/graphcore/unused/multi-node-setup/index.html new file mode 100644 index 0000000000..2317d69ec3 --- /dev/null +++ b/ai-testbed/graphcore/unused/multi-node-setup/index.html @@ -0,0 +1,6691 @@ + + + + + + + + + + + + + + + + + + + + + Multi-node Setup - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Multi-node Setup

+

These steps only need to be executed once per user.

+

Running on multiple nodes is a three step process.

+
    +
  1. +

    Create a Key

    +
    cd ~/.ssh
    +ssh-keygen -t rsa -b 4096
    +
    +
  2. +
  3. +

    Put Key into Authorized_keys File

    +
    cat id_rsa.pub >> authorized_keys
    +
    +
  4. +
  5. +

    Add Node IP Addresses to Known_hosts File

    +
    ssh-keyscan -H 10.1.3.101 >> ~/.ssh/known_hosts
    +ssh-keyscan -H 10.1.3.102 >> ~/.ssh/known_hosts
    +ssh-keyscan -H 10.1.3.103 >> ~/.ssh/known_hosts
    +ssh-keyscan -H 10.1.3.104 >> ~/.ssh/known_hosts
    +
    +
  6. +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/unused/profiling-mnist/index.html b/ai-testbed/graphcore/unused/profiling-mnist/index.html new file mode 100644 index 0000000000..e16d2809e1 --- /dev/null +++ b/ai-testbed/graphcore/unused/profiling-mnist/index.html @@ -0,0 +1,6721 @@ + + + + + + + + + + + + + + + + + + + + + Profiling MNIST - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Profiling MNIST

+

Follow all the instructions in Getting Started to log into a Graphcore node.

+

Follow the instructions in Virtual Environments up to and including PopART Environment Setup.

+

Following the instructions in Example Programs up to and including +MNIST, Install Requirements.

+

Change Directory

+
cd ~/graphcore/tutorials/simple_applications/pytorch/mnist
+
+

Set Poplar Options

+

Set the option to generate all reports, i.e., "autoReport.all":"true".

+

Set the reports directory, i.e., "autoReport.directory":"./reports".

+

Do so by running the following commands:

+
export POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.directory":"./reports"}'
+
+

Run MNIST

+

Do so by running the following command:

+
python mnist_poptorch.py
+
+

When MNIST has finished running, see Profiling to use Graph Analyser.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/unused/profiling-resnet50/index.html b/ai-testbed/graphcore/unused/profiling-resnet50/index.html new file mode 100644 index 0000000000..0b65cbb057 --- /dev/null +++ b/ai-testbed/graphcore/unused/profiling-resnet50/index.html @@ -0,0 +1,6761 @@ + + + + + + + + + + + + + + + + + + + + + Profiling ResNet50 - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Profiling ResNet50

+

Follow all the instructions in Getting Started to log into a Graphcore node.

+

Follow the instructions in Virtual Environments up to and including PopART Environment Setup.

+

Examples Repo

+

Graphcore provides examples of some well-known AI applications in their repository at https://github.com/graphcore/examples.git.

+

Clone the examples repository to your personal directory structure:

+
mkdir ~/graphcore
+cd ~/graphcore
+git clone https://github.com/graphcore/examples.git
+
+

Install Requirements

+

Change directory

+
cd ~/graphcore/examples/vision/cnns/pytorch
+python -m pip install -r requirements.txt
+
+

Export Variables

+

Export the datasets directory.

+
export POPLAR_ENGINE_OPTIONS='{"autoReport.all":"true", "autoReport.directory":"./reports"}'
+export DATASETS_DIR=/software/datasets
+HOST1=`ifconfig eno1 | grep "inet " | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' | head -1`
+OCT123=`echo "$HOST1" | cut -d "." -f 1,2,3`
+OCT4=`echo "$HOST1" | cut -d "." -f 4`
+HOST2=$OCT123.`expr $OCT4 + 1`
+HOST3=$OCT123.`expr $OCT4 + 2`
+HOST4=$OCT123.`expr $OCT4 + 3`
+export HOSTS=$HOST1,$HOST2,$HOST3,$HOST4
+export CLUSTER=c16
+VIPU_SERVER=${VIPU_SERVER:=$HOST1}
+FIRST_PARTITION=`vipu-admin list partitions --api-host $VIPU_SERVER| grep ACTIVE | cut -d '|' -f 3 | cut -d ' ' -f 2 | head -1`
+PARTITON=${PARTITION:=$FIRST_PARTITION}
+
+

Profile ResNet50

+

Profile ResNet50.

+
+

Note: Use screen because every run is long.

+
+
cd train
+python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod16
+
+

Profile Results

+

When ResNet50 has finished running, see Profiling to use Graph Analyser.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/unused/profiling/index.html b/ai-testbed/graphcore/unused/profiling/index.html new file mode 100644 index 0000000000..6bd9205b84 --- /dev/null +++ b/ai-testbed/graphcore/unused/profiling/index.html @@ -0,0 +1,6879 @@ + + + + + + + + + + + + + + + + + + + + + Profiling - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Profiling

+

This is an adaptation of Capturing IPU Reports.

+

Reports

+

Capturing IPU Reports

+

See Capturing IPU Reports for more information.

+

This section describes how to generate the files that the Graph Analyser can analyze. The Graph Analyser uses report files generated during compilation and execution by the Poplar SDK.

+

IPU Memory Overhead

+

Because of all these extra memory requirements, a model with high memory consumption may go out of memory when profiling is enabled. Depending on the model, you can adjust its parameters to leave space for the instrumentation. For example, you can try decreasing the batch size. In TensorFlow BERT you can adjust the micro batch-size.

+

Host Computing Overhead

+

It is essential that you also try to reduce the iterations on each run. For instance, by reducing the number of steps or the number of batches per step you can get a lighter execution profile. This will not only reduce the host computation overhead but will also speed up visualization in the Graph Analyser.

+

Download PopVision

+
    +
  1. +

    Download PopVision Tools.

    +
  2. +
  3. +

    Click Download Now button.

    +
  4. +
  5. +

    In the Graph Analyser section, select you operating system.

    +
  6. +
  7. +

    Install per selected operating system.

    +
  8. +
+

Create SSH Session

+

Use ssh from your development system.

+

The ssh command will use a jumphost and port forwarding. The format is as follows:

+
ssh -J ALCFUserID@gc-login-dd.ai.alcf.anl.gov ALCFUserID@gc-poplar-DD -L 8090:127.0.0.1:22
+ssh -J wilsonb@gc-login-01.ai.alcf.anl.gov wilsonb@gc-poplar-02.ai.alcf.anl.gov -L 8090:127.0.0.1:22
+
+

Where:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ArgumentHelp
ALCFUserIDIs your ALCF user identification.
ddIs the Graphcore login node to use, i.e., 01 or 02
DDIs the Graphcore node to use, i.e., 01, 02, 03, or 04.
8090Is the port on your local machine.
127.0.0.1:22Is the local IP address and port on the remote machine.
+

You will receive a prompt.

+

Launch Graph Analyser

+

Continue on your development machine.

+

Operating System

+

Ubuntu

+
cd /path/to/graph/analyser/directory
+./popvision-graph-analyser-3.11.6.AppImage
+
+

User Interface

+

Graph Analyser

+

Graphcore System View

+
    +
  1. Click Open a report...;
  2. +
  3. Click the remote tab;
  4. +
  5. Enter your ALCFUserID for remote machine;
  6. +
  7. Enter the Hostname of your local machine, i.e., 127.0.0.1;
  8. +
  9. Enter your Port address used in the ssh command, e.g., 8090;
  10. +
  11. Click Connect;
  12. +
  13. Navigate to your reports directory;
  14. +
  15. Select the training directory;
  16. +
  17. Select archive.a file; and
  18. +
  19. Click Open button.
  20. +
+

The Summary Report will be displayed.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/graphcore/virtual-environments/index.html b/ai-testbed/graphcore/virtual-environments/index.html new file mode 100644 index 0000000000..66f638b569 --- /dev/null +++ b/ai-testbed/graphcore/virtual-environments/index.html @@ -0,0 +1,6943 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Virtual Environment - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Virtual Environments

+

Poplar SDK Setup

+

The Poplar SDK is downloaded onto the graphcore systems at the /software/graphcore/poplar_sdk/ location. The default poplar +version (3.3.0) is enabled automatically upon logging into a graphcore node.

+

Check if Poplar is setup correctly:

+
popc --version
+
+

One should see:

+
POPLAR version 3.3.0 (de1f8de2a7)
+clang version 16.0.0 (2fce0648f3c328b23a6cbc664fc0dd0630122212)
+
+

If the Poplar SDK is not enabled, it can be enabled with +

source /software/graphcore/poplar_sdk/3.3.0/enable
+

+

To disable the current Poplar SDK, e.g. if one wants to use a different Poplar SDK, follow the steps below. (Otherwise, skip to section Miscellaneous Environment Variables.) +This example assumes that the current installed SDK is 3.1.0 and you want to move to 3.3.0

+
    +
  1. Check the current version +
     $ popc --version
    + POPLAR version 3.1.0 (e12d5f9f01)
    + clang version 15.0.0 (bab932b4fc4cdb58bb009370384b2c41579bd9d9)
    +
  2. +
  3. Unset the current version +
    unset POPLAR_SDK_ENABLED
    +
  4. +
  5. Enable poplar and popart +
    source /software/graphcore/poplar_sdk/3.3.0/poplar-ubuntu_20_04-3.3.0+7857-b67b751185/enable.sh 
    +source /software/graphcore/poplar_sdk/3.3.0/popart-ubuntu_20_04-3.3.0+7857-b67b751185/enable.sh 
    +
  6. +
  7. Recheck for the new version. +
    $popc --version
    +POPLAR version 3.3.0 (de1f8de2a7)
    +clang version 16.0.0 (2fce0648f3c328b23a6cbc664fc0dd0630122212)
    +
  8. +
  9. +

    Set SDK env variable +

    POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.3.0/
    +export POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT
    +

    +
  10. +
  11. +

    Create a new virtual environment with this SDK and install popTorch and or other frameworks as needed. +

    virtualenv ~/Graphcore/workspace/poptorch33_env
    +source ~/Graphcore/workspace/poptorch33_env/bin/activate
    +pip install $POPLAR_SDK_ROOT/poptorch-3.3.0+113432_960e9c294b_ubuntu_20_04-cp38-cp38-linux_x86_64.whl
    +export PYTHONPATH=$POPLAR_SDK_ROOT/python:$PYTHONPATH
    +

    +
  12. +
+

Miscellaneous Environment Variables

+
mkdir ~/tmp
+export TF_POPLAR_FLAGS=--executable_cache_path=~/tmp
+export POPTORCH_CACHE_DIR=~/tmp
+
+export POPART_LOG_LEVEL=WARN
+export POPLAR_LOG_LEVEL=WARN
+export POPLIBS_LOG_LEVEL=WARN
+
+export PYTHONPATH=/software/graphcore/poplar_sdk/3.3.0/poplar-ubuntu_20_04-3.3.0+7857-b67b751185/python:$PYTHONPATH
+
+

PopTorch Environment Setup

+

PopTorch is an extension of the Pytorch framework that is optimized for the IPU specific functionality. To activate the PopTorch environment, first create a virtual environment and activate it.

+
mkdir -p ~/venvs/graphcore
+virtualenv ~/venvs/graphcore/poptorch33_env
+source ~/venvs/graphcore/poptorch33_env/bin/activate
+
+

Use the following commands to install the PopTorch environment.

+
POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.3.0
+export POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT
+pip install $POPLAR_SDK_ROOT/poptorch-3.3.0+113432_960e9c294b_ubuntu_20_04-cp38-cp38-linux_x86_64.whl
+
+

TensorFlow 2 Environment Setup

+

The Poplar SDK provides TensorFlow and Keras wheels built on 2.6 that includes the IPU specific functionality and optimized for the AMD processors. It can be installed as follows.

+

Create virtual environment.

+
virtualenv ~/venvs/graphcore/tensorflow2_33_env
+source ~/venvs/graphcore/tensorflow2_33_env/bin/activate
+
+

Install the TensorFlow and Keras wheels.

+
POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.3.0
+export POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT
+pip install $POPLAR_SDK_ROOT/tensorflow-2.6.3+gc3.3.0+251580+08d96978c7f+amd_znver1-cp38-cp38-linux_x86_64.whl
+pip install $POPLAR_SDK_ROOT/keras-2.6.0+gc3.3.0+251582+a3785372-py2.py3-none-any.whl
+
+

Verify Installation

+
python -c "from tensorflow.python import ipu"
+
+

You should see:

+
2023-08-22 21:53:26.109934: I tensorflow/compiler/plugin/poplar/driver/poplar_platform.cc:43] Poplar version: 3.3.0 (de1f8de2a7) Poplar package: b67b751185
+
+

Installing Packages

+

Install packages in the normal manner such as:

+
python3 -m pip install "some_package"
+
+

For more details see Use pip for installing.

+

To install a different version of a package that is already installed in one's environment, one can use:

+
pip install --ignore-installed  ... # or -I
+
+
+

Note: Conda is not supported on the Graphcore system.

+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/groq/files/groqrack_system_diagram.png b/ai-testbed/groq/files/groqrack_system_diagram.png new file mode 100644 index 0000000000..368828a703 Binary files /dev/null and b/ai-testbed/groq/files/groqrack_system_diagram.png differ diff --git a/ai-testbed/groq/getting-started/index.html b/ai-testbed/groq/getting-started/index.html new file mode 100644 index 0000000000..c345cd0bbc --- /dev/null +++ b/ai-testbed/groq/getting-started/index.html @@ -0,0 +1,6848 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Getting Started - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Getting Started

+

Allocations

+

If you do not already have an allocation, you will need to request one here: +Discretionary Allocation Request (New & Renewal)

+

Accounts

+

If you do not have an ALCF account (but have an allocation), request one here: ALCF Account and Project Management

+

Setup

+

Connection to a GroqRack node is a two-step process.

+

The first step is to ssh from a local machine to a login node. +The second, optional step is to ssh from a login node to a GroqRack node. Jobs may also be started and tracked from login nodes.

+

GroqRack System View

+

Log in to a login node

+

Connect to a groq login node, editing this command line to use your ALCF user id. You will be prompted for a password; use the 8-digit code provided by MobilePASS+. +

ssh ALCFUserID@groq.ai.alcf.anl.gov
+
+This randomly selects one of the login nodes, namely groq-login-01.ai.alcf.anl.gov or groq-login-02.ai.alcf.anl.gov. You can alternatively ssh to the specific login nodes directly.

+

Log in to a GroqRack node

+

Once you are on a login node, optionally ssh to one of the GroqRack nodes, which are numbered 1-9.

+
ssh groq-r01-gn-01.ai.alcf.anl.gov
+# or
+ssh groq-r01-gn-09.ai.alcf.anl.gov
+# or any node with hostname of form groq-r01-gn-0[1-9].ai.alcf.anl.gov
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/groq/job-queuing-and-submission/index.html b/ai-testbed/groq/job-queuing-and-submission/index.html new file mode 100644 index 0000000000..078632f52f --- /dev/null +++ b/ai-testbed/groq/job-queuing-and-submission/index.html @@ -0,0 +1,6700 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Job Queueing and Submission - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Job Queueing and Submission

+

Groq jobs in the AI Testbed's groqrack are managed by the PBS job scheduler.
+Overview: PBS
+For additional information, see +https://docs.alcf.anl.gov/running-jobs/job-and-queue-scheduling/
+Man pages are available. These are the key commands: +

# qsub - to submit a batch job using a script
+man qsub
+# qstat - to display queue information
+man qstat
+# qdel - to delete (cancel) a job:
+man qdel
+# qhold - to hold a job
+man qhold
+

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/groq/running-a-model-or-program/index.html b/ai-testbed/groq/running-a-model-or-program/index.html new file mode 100644 index 0000000000..8e446b590f --- /dev/null +++ b/ai-testbed/groq/running-a-model-or-program/index.html @@ -0,0 +1,6991 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Running a Model/Program - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Running a Model/Program

+

Jobs are launched from any GroqRack node, or from login nodes.
+If you expect a loss of an internet connection for any reason, for long-running jobs we suggest logging into a specific node and using either screen or tmux to create persistent command line sessions. For details use:

+

man screen
+# or
+man tmux
+
+or online man pages: screen, tmux

+

Running jobs on Groq nodes

+

GroqFlow

+

GroqFlow is the simplest way to port applications running inference to groq. The groqflow github repo includes many sample applications.
+See GroqFlow.

+

Clone the GroqFlow github repo

+

Clone the groqflow github repo and change current directory to the clone: +

cd ~/
+git clone https://github.com/groq/groqflow.git
+cd groqflow
+

+

GroqFlow conda environments

+

Create a groqflow conda environment, and activate it. +Follow the instructions in the Virtual Environments
section. +Note: Similar install instructions are in ~/groqflow/docs/install.md or GroqFlow™ Installation Guide
+The conda enviroment should be reinstalled whenever new groqflow code is pulled from the groqflow github; with a groqflow conda environment activated, redo just the pip install steps.

+

Running a groqflow sample

+

Each groqflow sample directory in the ~/groqflow/proof_points tree has a README.md describing the sample and how to run it.

+

Optionally activate your GroqFlow conda environment

+
conda activate groqflow
+
+

Run a sample using PBS in batch mode

+

See Job Queueing and Submission for more information about the PBS job scheduler.

+

Create a script run_minilmv2.sh with the following contents. It assumes that conda was installed in the default location. The conda initialize section can also be copied from your .bashrc if the conda installer was allowed to add it. +

#!/bin/bash
+# >>> conda initialize >>>
+# !! Contents within this block are managed by 'conda init' !!
+__conda_setup="$(${HOME}'/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
+if [ $? -eq 0 ]; then
+    eval "$__conda_setup"
+else
+    if [ -f "${HOME}/miniconda3/etc/profile.d/conda.sh" ]; then
+        . "${HOME}/miniconda3/etc/profile.d/conda.sh"
+    else
+        export PATH="${HOME}/miniconda3/bin:$PATH"
+    fi
+fi
+unset __conda_setup
+# <<< conda initialize <<<
+conda activate groqflow
+cd ~/groqflow/proof_points/natural_language_processing/minilm
+pip install -r requirements.txt
+python minilmv2.py
+

+

Then run the script as a batch job with PBS: +

qsub -l groq_accelerator=1 run_minilmv2.sh
+

+

Note: the number of chips used by a model can be found in the compile cache dir for the model after it is compiled. E.g. +

$ grep num_chips_used ~/.cache/groqflow/minilmv2/minilmv2_state.yaml
+num_chips_used: 1
+
+The groqflow proofpoints models use 1, 2 or 4 chips.

+

If your ~/.bashrc initializes conda, an alternative to copying the conda initilization script into your execution scripts is to comment out this section in your "~/.bashrc": +

# If not running interactively, don't do anything
+case $- in
+    *i*) ;;
+      *) return;;
+esac
+
+to +
## If not running interactively, don't do anything
+#case $- in
+#    *i*) ;;
+#      *) return;;
+#esac
+
+Then the execution script becomes: +
#!/bin/bash
+conda activate groqflow
+cd ~/groqflow/proof_points/natural_language_processing/minilm
+pip install -r requirements.txt
+python minilmv2.py
+
+Job status can be tracked with qstat: +
$ qstat
+Job id            Name             User              Time Use S Queue
+----------------  ---------------- ----------------  -------- - -----
+3084.groq-r01-co* run_minilmv2     user              0 R workq           
+$ 
+

+

Output will by default go to two files with names like the following, where the suffix is the job id. One standard output for the job. The other is the standard error for the job. +

$ ls -la run_minilmv2.sh.*
+-rw------- 1 user users   448 Oct 16 18:40 run_minilmv2.sh.e3082
+-rw------- 1 user users 50473 Oct 16 18:42 run_minilmv2.sh.o3082
+

+

Run a sample using PBS in interactive mode

+

An alternative is to use an interactive PBS job. This may be useful when debugging new or changed code. Here is an example that starts a 24 hour interactive job. +

qsub -IV -l walltime=24:00:00 -l groq_accelerator=2
+
+Then activate your groqflow environment, and run python scripts with +
conda activate groqflow
+python scriptname.py
+

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/groq/system-overview/index.html b/ai-testbed/groq/system-overview/index.html new file mode 100644 index 0000000000..9f5c82e7f9 --- /dev/null +++ b/ai-testbed/groq/system-overview/index.html @@ -0,0 +1,6692 @@ + + + + + + + + + + + + + + + + + + + + + + + + + System Overview - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

System Overview

+ +

ALCF's Groq system consists of a single GroqRackTM compute cluster that provides an extensible accelerator network consisting of 9 GroqNodeTM [ groq-r01-gn-01 through groq-r01-gn-09 ] nodes with a rotational multi-node network topology. Each of these GroqNodes consists of 8 GroqCardTM accelerators in them with integrated chip-to-chip connections with a dragonfly multi-chip topology.

+

GroqCardTM accelerator is a dual-width, full-height, three-quarter length PCI-Express Gen4 x16 adapter that includes a single GroqChipTM processor with 230 MB of on-chip memory. Based on the proprietary Tensor Streaming Processor (TSP) architecture, the GroqChip processor is a low latency and high throughput single core SIMD compute engine capable of 750 TOPS (INT8) and 188 TFLOPS (FP16) @ 900 MHz that includes advanced vector and matrix mathematical acceleration units. The GroqChip processor is deterministic, providing predictable and repeatable performance.

+

The GroqWare suite SDK uses a API based programming model and enables users to develop, compile, and run models on the GroqCard accelerator in a host server system. The SDK uses a ONNX/MLIR enabled DAG compiler and it consists of Groq Compiler, Groq API, and utility tools like GroqView™ profiler and groq-runtime.

+ +

+
+ +

For more information refer to the following links:

+

GroqRack spec sheet
+GroqNode spec sheet
+GroqCard spec sheet
+GroqChip spec sheet
+(via)

+ + + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/groq/virtual-environments/index.html b/ai-testbed/groq/virtual-environments/index.html new file mode 100644 index 0000000000..f9cb052208 --- /dev/null +++ b/ai-testbed/groq/virtual-environments/index.html @@ -0,0 +1,6841 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Virtual Environments - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Virtual Environments

+

Install conda

+

If conda is not already installed: +

rm Miniconda3-latest-Linux-x86_64.sh*
+wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
+bash Miniconda3-latest-Linux-x86_64.sh
+# answer y/yes to all prompts
+# exit ssh session, then start a new ssh session
+exit
+

+

GroqFlow conda environment setup

+

Create and activate a groqflow conda environment

+

Create a groqflow conda environment and activate it +

export PYTHON_VERSION=3.10.12
+conda create -n groqflow python=$PYTHON_VERSION
+conda activate groqflow
+

+

Install groqflow into the groqflow conda environment

+

Execute the following commands to install groqflow into the activated groqflow conda environment +

# Alter this if you have cloned groqflow to some other location.
+cd ~/groqflow
+pip install --upgrade pip
+pip install -e .
+pushd . 
+cd demo_helpers
+pip install -e .
+popd
+pip install soundfile
+

+

To use groqfloq, +

conda activate groqflow
+
+Note: Always use a personal conda environment when installing packages on groq nodes; otherwise they can get installed into ~/.local and can cause problems when your shared home directory is used on other systems. If you encounter mysterious package dependency/version issues, check your ~/.local/lib and ~/.local/bin for mistakenly installed packages.

+

Note: The conda enviroment should be reinstalled whenever new groqflow code is pulled from the groqflow github; with a groqflow conda environment activated, redo just the pip install steps.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/javascripts/alcf-extra.js b/ai-testbed/javascripts/alcf-extra.js new file mode 100644 index 0000000000..7510cf11b3 --- /dev/null +++ b/ai-testbed/javascripts/alcf-extra.js @@ -0,0 +1,144 @@ +/** + * Dropdown + * + * @description + * + * @param config An object of configuration settings: + * + * + * @return new instance of Dropdown + */ + + + + + +// Config defaults and init +// ---------------------------------------------------------------------------- + +var Dropdown = function (config) { + this.hook = config.hook || 'js-drop'; + this.menu = config.menu; + this.event = config.event || 'click'; + this.pane = document.getElementById(this.menu); +} + + +Dropdown.prototype.init = function() { + this.modifyHooks(this.hook, this.addListener.bind(this, this.event)); +} + + + + + +// Shared methods +// ---------------------------------------------------------------------------- + +// grab element +Dropdown.prototype.modifyHooks = function(hook, func) { + var elem = document.getElementById(hook); + // this.addBgListener(elem); + func(elem); +} + +// attach listeners to the document and menu items +Dropdown.prototype.addListener = function(event, elem) { + document.addEventListener("mouseover", function(e) { + if (e.target.closest("#"+this.hook)) { + this.toggleMenu(elem); + } + else if (e.target.closest("#"+this.menu)) {return;} + else if (this.pane.classList.contains('js-dropdown-visible')) { + this.toggleMenu(elem); + } + }.bind(this), false); + document.addEventListener("mouseout", function(e) { + if (e.target.closest("#"+this.hook)) { + this.toggleMenu(elem); + } + else if (e.target.closest("#"+this.menu)) {return;} + else if (this.pane.classList.contains('js-dropdown-visible')) { + this.toggleMenu(elem); + } + }.bind(this), false); +} + +// toggle menu pane visibility +Dropdown.prototype.toggleMenu = function(elem) { + if (this.pane.classList.contains('js-dropdown-hidden')) { + this.pane.classList.replace('js-dropdown-hidden', 'js-dropdown-visible'); + } else if (this.pane.classList.contains('js-dropdown-visible')) { + this.pane.classList.replace('js-dropdown-visible', 'js-dropdown-hidden'); + } +} + + + + +// Include the dropdowns +// ---------------------------------------------------------------------------- + +var dropdowns = document.getElementsByClassName('js-drop'); + +if (dropdowns.length > 0) { + var menus = []; + + Array.prototype.forEach.call(dropdowns, function(el) { + menus.push(new Dropdown({'hook': el.id, 'menu': el.dataset.menu})); + }); + + Array.prototype.forEach.call(menus, function(m) { + m.init(); + }); +} + + + + +// Include the mobile dropdowns (unlike above, just writing it all here) +// ---------------------------------------------------------------------------- + + +// open/close the big pane +var mobileOpen = document.getElementById('js-mobileOpen'); +var mobileClose = document.getElementById('js-mobileClose'); +var mobileMenu = document.getElementById('js-mobileMenu'); + +mobileOpen.addEventListener("click", function(e) { + mobileMenu .classList.replace("menu--closed", "menu--open"); +}); + +mobileClose.addEventListener("click", function(e) { + mobileMenu .classList.replace("menu--open", "menu--closed"); +}); + + +// open/close individual menus + +var drawerHeads = document.getElementsByClassName('drawer-head'); + +Array.prototype.forEach.call(drawerHeads, function(head){ + + head.addEventListener("click", function(e){ + var mobmenu = head.dataset.mobmenu; + mobmenu = document.getElementById(mobmenu); + var arrow = this.querySelector(".drawer-arrow"); + + if (mobmenu.classList.contains('menu--closed')) { + mobmenu.classList.remove('menu--closed'); + arrow.innerHTML = "▲" + } + else { + mobmenu.classList.add('menu--closed'); + arrow.innerHTML = "▼" + } + + }); +}); + +// add listener to each of the links that toggles menus + + + + diff --git a/ai-testbed/sambanova/TODO/index.html b/ai-testbed/sambanova/TODO/index.html new file mode 100644 index 0000000000..9c6f333350 --- /dev/null +++ b/ai-testbed/sambanova/TODO/index.html @@ -0,0 +1,6675 @@ + + + + + + + + + + + + + + + + + + + + + TODO - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

TODO

+
    +
  • docs/ai-testbed/sambanova_gen2/example-multi-node-programs.md
  • +
  • docs/ai-testbed/sambanova_gen2/ GPT2 example
  • +
+

Using /data/ANL/results/sn30-r1-h1/wilsonb/032223.18/GPT1.5B.out for output +Using /data/ANL/results/sn30-r2-h1/wilsonb/032223.19/GPT1.5B.out for output

+

Using /data/ANL/results/sn30-r2-h1/wilsonb/032223.19/BertLarge.out for output

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/documentation/index.html b/ai-testbed/sambanova/documentation/index.html new file mode 100644 index 0000000000..5d0c3a6d1d --- /dev/null +++ b/ai-testbed/sambanova/documentation/index.html @@ -0,0 +1,6688 @@ + + + + + + + + + + + + + + + + + + + + + + + + + SambaNova Documentation - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Documentation

+

The SambaNova documentation is now available online SambaNova Documentation.

+

The documentation for the SambaTune (a profiling and performance tuning tool for SambaNova systems) is now available at SambaTune Documentation.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/example-multi-node-programs/index.html b/ai-testbed/sambanova/example-multi-node-programs/index.html new file mode 100644 index 0000000000..51f4a04838 --- /dev/null +++ b/ai-testbed/sambanova/example-multi-node-programs/index.html @@ -0,0 +1,7060 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Example Multi-Node Programs - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Example Multi-Node Programs

+

In this section we will learn how to extend the UNet2d and Gpt1.5B applications scripts that we introduced in the Example Programs to compile and run multiple instances of the model in a data parallel fashion across multiple tiles or across multiple nodes.

+

UNet2d

+

Set Up

+

Create the following directory and change to it if you have not already done so.

+
mkdir -p ~/apps/image/unet
+cd ~/apps/image/unet
+
+

Create Unet2d.sh and unet_batch.sh

+

Create the file Unet2d.sh and unet_batch.sh in the current directory using your favorite editor. +Copy and paste the contents of Unet2d.sh and unet_batch.sh +to files with the same name into the current directory using your favorite editor.

+
chmod +x Unet2d.sh
+chmod +x unet_batch.sh
+
+

Compile and run

+

Run these commands for training (compile + train): +The compile and run scripts have the following input arguments.

+
    +
  1. +

    image size: The images are square. Valid sizes include 256, 512, and 1024.

    +
  2. +
  3. +

    Batch size: local batch size. The global batch size is local batch size * Num of instances.

    +
  4. +
  5. +

    num of instances: Total number of instances of Unet2d run in data parallel framework.

    +
  6. +
  7. +

    RunID: A unique Id for the compile or run process.

    +
  8. +
+

The script uses the arguments pcompile and prun for the data parallel compile and run.

+
./Unet2d.sh pcompile <image size> <batch_size> <num of instances> <RunID>
+./Unet2d.sh prun <image size> <batch_size> <num of instances> <RunID>
+
+

For a image size of 256x256 and local batch size of 256 when running 8 instance, the commands are provided as follows.

+
./Unet2d.sh pcompile 256 256 8 unet2d_8inst_pcompile
+./Unet2d.sh prun 256 256 8 unet2d_8inst_prun
+
+

The above commands displays the file that contains the output for the execution of the above scripts, usually /data/ANL/results/<hostname>/<userId>/<RunID>/Unet2d.out

+

You can inspect the compile command that contains --data-parallel -ws 2 arguments to ensure that the pef file is compatible for data parallel runs. The pef generated from the compilation process for the above compile command is placed under out/Unet2d/unet_train_256_256_NP_4 inside the current working directory.

+
python /opt/sambaflow/apps/image/segmentation/compile.py compile --mac-v2 --in-channels=3 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --num-tiles=4 --pef-name=unet_train_${BS}_${2}_NP_${NUM_TILES}  --data-parallel -ws 2 --output-folder=${OUTDIR}
+
+

Once the model is compiled, sbatch is used to launch the multiple instances. The below example shows that a total of 8 tasks or instances are launched over the host on which the script is launched.

+
sbatch --gres=rdu:1 --tasks-per-node ${NP} --nodes 1 --nodelist $(hostname) --cpus-per-task=${cpus} $(pwd)/unet_batch.sh ${NP} ${NUM_WORKERS} ${BS} ${2} ${5}
+
+

The run command has --data-parallel --reduce-on-rdu arguments that is compatible with data parallel run.

+
srun --mpi=pmi2 python /opt/sambaflow/apps/image/segmentation//hook.py run --data-cache=${CACHE_DIR}  --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling  --min-throughput 395 --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --data-parallel --reduce-on-rdu --pef=${OUTDIR}/unet_train_${BS}_${IM}_NP_4/unet_train_${BS}_${IM}_NP_4.pef
+
+

The throughput is calculated by averaging the e2e samples_per_sec over the different instances.

+
inner train loop time : 36.314290046691895 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 563.9653143065
+inner train loop time : 33.36756229400635 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 613.7697389922524
+inner train loop time : 33.94625234603882 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 603.3066563941279
+inner train loop time : 32.309499979019165 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 633.8692958200872
+inner train loop time : 31.418426036834717 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 651.8467849404489
+inner train loop time : 28.164129495620728 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 727.1660927132315
+inner train loop time : 30.29698896408081 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 675.9747651583616
+inner train loop time : 25.332663536071777 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 808.442427336472
+
+

Gpt 1.5B

+

Set up

+
mkdir ~/nlp-multiNodetest
+cd ~/nlp-multiNodetest
+
+

Create and run Gpt1.5B_compile.sh and Gpt1.5B_run.sh

+

Create the files Gpt1.5B_compile.sh and Gpt1.5B_run.sh in the current directory. +Copy the contents of Gpt1.5B_compile.sh and Gpt1.5B_run.sh. Alternatively, the files can be accessed at /data/ANL/scripts/Gpt1.5B_compile.sh and /data/ANL/scripts/Gpt1.5B_run.sh on any of the compute node and can be copied over to the working directory.

+

Compile and Run

+

This script consists of commands to compile and run multiple instances of Gpt1.5B model across multiple nodes. Run the Gpt1.5B_compile.sh to first compile and generate the pef file for the model and it in turn launches the Gpt1.5B_run.sh script to run multiple instances of the model over the different nodes.

+
chmod +x Gpt1.5B_compile.sh
+chmod +x Gpt1.5B_run.sh
+./Gpt1.5B_compile.sh
+
+

You can see the log file path displayed on the screen as seen in the example below. You can use the tail command to check the progress of the run.

+
vsastry@sn30-r1-h1:~/nlp-multiNodetest$ ./Gpt1.5B_compile.sh
+Using /data/ANL/results/sn30-r1-h1/vsastry/041823.19/GPT1.5B.out for output
+
+

The artifacts of the compile process is produced in the path : /data/scratch/<userId>.

+

Inspect the compile command in the script to see that it includes additional arguments --data-parallel and -ws 2 to generate a pef that is compatible for data parallel runs.

+
python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train  --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_lnsd.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nonpardp_norc_e2e.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --data-parallel -ws 2 --weight_decay 0.1  --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15 --output-folder=${OUTDIR}
+
+

Once the model is compiled, sbatch is used to launch the multiple instances across the nodes. The below example shows that a total of 32 tasks or instances are launched over 2 nodes with each node having a maximum of 16 tasks. Slurm allocates any 2 of the available nodes in this example.

+
/usr/local/bin/sbatch --output=${HOME}/slurm-%A.out --ntasks 32 --gres=rdu:1 --ntasks-per-node 16  --nodes 2 --cpus-per-task=8  Gpt1.5B_run.sh ${1} >> ${OUTPUT_PATH} 2>&1
+
+

The run command for each of this instance is present in the Gpt1.5B_run.sh script. You can inspect the command in the script to see that --data-parallel --reduce-on-rdu arguments are present to ensure that the model is run in a data parallel fashion and that the gradient accumulation takes place on the RDU.

+
/usr/local/bin/srun --mpi=pmi2 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run  -b 16  --module_name gpt2_pretrain --task_name clm --max_seq_length 1024  --overwrite_output_dir --do_train  --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/  --tokenizer_name gpt2 --model_name gpt2 --non_split_head --skip_broadcast_patch --no_index_select_patch --output_dir=${OUTDIR}/hf_output --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --max_grad_norm_clip 1.0 --skip_checkpoint --data-parallel --reduce-on-rdu --data_dir /data/ANL/ss1024 --data_dir /data/ANL/ss1024  --logging_steps 1 --max_steps 900000 --learning_rate 0.00025 --steps_this_run 800 --min_throughput 299000 --max_throughput 600000 --pef=${OUTDIR}/gpt15/gpt15.pef >> ${OUTPUT_PATH} 2>&1
+
+

squeue shows that the model is run on 2 nodes sn30-r1-h1 and sn30-r2-h2.

+
JOBID PARTITION                      NAME     USER ST       TIME  NODES NODELIST(REASON)
+10191 sambanova            Gpt1.5B_run.sh  vsastry  R      23:18      2 sn30-r1-h1,sn30-r2-h2
+
+

sntilestat can also be used to check the total numbers of tiles used for the runs.

+
TILE                 %idle %exec %pload %aload %chkpt %quiesce    PID     USER COMMAND
+/XRDU_0/RDU_0/TILE_0   8.0  91.6    0.3    0.1    0.0      0.0 2750333  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_0/TILE_1   8.0  91.6    0.3    0.1    0.0      0.0 2750333  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_0/TILE_2   7.9  91.6    0.3    0.3    0.0      0.0 2750333  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_0/TILE_3   7.7  91.8    0.3    0.3    0.0      0.0 2750333  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_0/TILE_4   7.6  91.9    0.4    0.1    0.0      0.0 2750339  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_0/TILE_5   7.5  91.9    0.5    0.1    0.0      0.0 2750339  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_0/TILE_6   7.5  91.8    0.5    0.3    0.0      0.0 2750339  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_0/TILE_7   7.3  92.0    0.6    0.0    0.0      0.0 2750339  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_1/TILE_0   8.9  89.9    1.0    0.1    0.0      0.0 2750338  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_1/TILE_1   9.0  89.9    0.9    0.1    0.0      0.0 2750338  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_1/TILE_2   8.6  89.8    1.4    0.1    0.0      0.0 2750338  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_1/TILE_3   8.5  89.9    1.4    0.1    0.0      0.0 2750338  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_1/TILE_4   7.9  90.9    0.9    0.4    0.0      0.0 2750343  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_1/TILE_5   7.7  90.9    0.9    0.5    0.0      0.0 2750343  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_1/TILE_6   7.7  91.0    0.9    0.4    0.0      0.0 2750343  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_0/RDU_1/TILE_7   8.0  91.0    0.6    0.4    0.0      0.0 2750343  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_0/TILE_0   7.6  92.0    0.3    0.1    0.0      0.0 2750345  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_0/TILE_1   7.6  92.0    0.3    0.1    0.0      0.0 2750345  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_0/TILE_2   7.5  92.1    0.3    0.1    0.0      0.0 2750345  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_0/TILE_3   7.5  92.1    0.3    0.1    0.0      0.0 2750345  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_0/TILE_4   7.5  92.1    0.3    0.1    0.0      0.0 2750335  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_0/TILE_5   7.5  92.1    0.3    0.1    0.0      0.0 2750335  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_0/TILE_6   7.5  92.1    0.3    0.1    0.0      0.0 2750335  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_0/TILE_7   7.5  92.1    0.3    0.1    0.0      0.0 2750335  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_1/TILE_0   7.7  91.5    0.4    0.4    0.0      0.0 2750330  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_1/TILE_1   7.9  91.5    0.3    0.4    0.0      0.0 2750330  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_1/TILE_2   7.9  91.5    0.3    0.4    0.0      0.0 2750330  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_1/TILE_3   7.6  91.8    0.4    0.3    0.0      0.0 2750330  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_1/TILE_4   7.7  91.9    0.4    0.0    0.0      0.0 2750334  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_1/TILE_5   7.7  91.9    0.4    0.0    0.0      0.0 2750334  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_1/TILE_6   7.9  91.9    0.3    0.0    0.0      0.0 2750334  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_1/RDU_1/TILE_7   7.9  91.9    0.3    0.0    0.0      0.0 2750334  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_0/TILE_0   8.0  91.8    0.1    0.1    0.0      0.0 2750346  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_0/TILE_1   8.0  91.8    0.1    0.1    0.0      0.0 2750346  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_0/TILE_2   8.0  91.8    0.1    0.1    0.0      0.0 2750346  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_0/TILE_3   7.7  91.9    0.1    0.3    0.0      0.0 2750346  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_0/TILE_4   7.5  92.0    0.5    0.0    0.0      0.0 2750336  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_0/TILE_5   7.6  91.9    0.5    0.0    0.0      0.0 2750336  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_0/TILE_6   7.6  91.9    0.4    0.1    0.0      0.0 2750336  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_0/TILE_7   7.5  91.9    0.4    0.3    0.0      0.0 2750336  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_1/TILE_0   7.5  91.8    0.6    0.1    0.0      0.0 2750331  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_1/TILE_1   7.5  91.8    0.6    0.1    0.0      0.0 2750331  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_1/TILE_2   7.7  91.6    0.5    0.1    0.0      0.0 2750331  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_1/TILE_3   7.7  91.6    0.5    0.1    0.0      0.0 2750331  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_1/TILE_4   7.9  91.4    0.8    0.0    0.0      0.0 2750329  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_1/TILE_5   7.9  91.4    0.8    0.0    0.0      0.0 2750329  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_1/TILE_6   8.1  91.4    0.5    0.0    0.0      0.0 2750329  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_2/RDU_1/TILE_7   8.2  91.4    0.4    0.0    0.0      0.0 2750329  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_0/TILE_0   7.5  91.8    0.4    0.4    0.0      0.0 2750344  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_0/TILE_1   7.5  91.8    0.4    0.4    0.0      0.0 2750344  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_0/TILE_2   7.5  91.8    0.4    0.4    0.0      0.0 2750344  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_0/TILE_3   7.5  91.8    0.4    0.4    0.0      0.0 2750344  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_0/TILE_4   7.6  91.8    0.3    0.4    0.0      0.0 2750337  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_0/TILE_5   7.7  91.8    0.1    0.4    0.0      0.0 2750337  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_0/TILE_6   7.7  91.8    0.3    0.3    0.0      0.0 2750337  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_0/TILE_7   7.7  91.9    0.3    0.1    0.0      0.0 2750337  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_1/TILE_0   7.7  92.0    0.1    0.1    0.0      0.0 2750347  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_1/TILE_1   7.7  92.0    0.1    0.1    0.0      0.0 2750347  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_1/TILE_2   7.7  92.1    0.1    0.0    0.0      0.0 2750347  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_1/TILE_3   7.7  92.1    0.1    0.0    0.0      0.0 2750347  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_1/TILE_4   7.3  91.9    0.5    0.3    0.0      0.0 2750332  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_1/TILE_5   7.3  91.9    0.5    0.3    0.0      0.0 2750332  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_1/TILE_6   7.3  91.9    0.5    0.3    0.0      0.0 2750332  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+/XRDU_3/RDU_1/TILE_7   7.3  92.0    0.5    0.1    0.0      0.0 2750332  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b
+
+

The Slurm log associated with the JOBID (10191 in the above example) is located in the home directory. You can use the tail command to check the progress of the training.

+
vsastry@sn30-r1-h1:~$ tail -f ~/slurm-10191.out
+Using /data/ANL/results/sn30-r1-h1/vsastry/041823.03/Gpt1.5B.out for output
+
+
vsastry@sn30-r1-h1:~$ tail -f /data/ANL/results/sn30-r1-h1/vsastry/041823.03/Gpt1.5B.out
+
+

Once the run is completed, check the log file for the performance results.

+
{'e2e_train_time': 2179.2292835712433, 'training_sequences_per_second': 192467.31088004305, 'final_loss': 4.781678199768066}
+247/3247 [01:03<00:00, 50.76it/s]
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/example-programs/index.html b/ai-testbed/sambanova/example-programs/index.html new file mode 100644 index 0000000000..47be2aa3e9 --- /dev/null +++ b/ai-testbed/sambanova/example-programs/index.html @@ -0,0 +1,7300 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Example Programs - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Example Programs

+

SambaNova provides examples of some well-known simple AI applications under the path: /opt/sambaflow/apps/starters, on all SambaNova compute nodes. Make a copy of this to your home directory:

+
cd ~/
+mkdir apps
+cp -r /opt/sambaflow/apps/starters apps/starters
+
+

Deactivate any active conda environment. If you have conda installed and a conda environment is active, you will see something like (base) at the beginning of the command prompt. If so, you will need to deactivate it with conda deactivate. Conda is not used on the SambaNova SN30 cluster.

+

LeNet

+

Change directory

+
cd ~/apps/starters/lenet
+
+

Common Arguments

+

Below are some of the common arguments used across most of the models in the example code.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ArgumentDefaultHelp
-b1Batch size for training
-n,100Number of iterations to run
--num-iterationsthe pef for
-e,1Number epochs for training
--num-epochs
--log-path'checkLog path
points'
--num-workers0Number of workers
--measure-train-NoneMeasure training performance
performance
+

LeNet Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ArgumentDefaultHelp
--lr0.01Learning rate for training
--momentum0.0Momentum value for training
--weight-decay0.01Weight decay for training
--data-path'./data'Data path
--data-folder'mnist_Folder containing mnist data
data'
+

Establish the Environment

+
source /opt/sambaflow/apps/starters/lenet/venv/bin/activate
+
+
+

Note: If you receive an \"HTTP error\" message on any of the +following commands, run the command again. Such errors (e.g 503) are +commonly an intermittent failure to download a dataset.

+
+

Run these commands to compile and train the LeNet model:

+
srun python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
+srun python lenet.py run --pef="pef/lenet/lenet.pef"
+
+

Alternatively to use Slurm sbatch, create submit-lenet-job.sh with the following +contents:

+
#!/bin/sh
+
+python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
+python lenet.py run --pef="pef/lenet/lenet.pef"
+
+

Then

+
sbatch --output=pef/lenet/output.log submit-lenet-job.sh
+
+

Squeue will give you the queue status.

+
squeue
+# One may also...
+watch squeue
+
+

One may see the run log using:

+
cat pef/lenet/output.log
+
+

MNIST - Feed Forward Network

+

Establish the Environment

+
source /opt/sambaflow/apps/starters/ffn_mnist/venv/bin/activate
+
+

Change directory

+
cd ~/apps/starters/ffn_mnist/
+
+

Commands to run MNIST example:

+
srun python ffn_mnist.py  compile -b 1 --pef-name="ffn_mnist" --mac-v2
+srun python ffn_mnist.py  run -b 1 -p out/ffn_mnist/ffn_mnist.pef
+
+

To run the same using Slurm sbatch, create and run the submit-ffn_mnist-job.sh with the following contents.

+
#!/bin/sh
+python ffn_mnist.py  compile -b 1 --pef-name="ffn_mnist" --mac-v2
+python ffn_mnist.py  run -b 1 -p out/ffn_mnist/ffn_mnist.pef
+
+
sbatch --output=pef/ffn_mnist/output.log submit-ffn_mnist-job.sh
+
+

Logistic Regression

+

Establish the Environment

+
source /opt/sambaflow/apps/starters/logreg/venv/bin/activate
+
+

Change directory

+
cd ~/apps/starters/logreg
+
+

Logistic Regression Arguments

+

This is not an exhaustive list of arguments.

+

Arguments

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ArgumentDefaultHelpStep
--lr0.001Learning rate for trainingCompile
--momentum0.0Momentum value for trainingCompile
--weight-decay1e-4Weight decay for trainingCompile
--num-features784Number features for trainingCompile
--num-classes10Number classes for trainingCompile
--weight-normnaEnable weight normalizationCompile
+

Run these commands:

+
srun python logreg.py compile --pef-name="logreg" --output-folder="pef"
+srun python logreg.py run --pef="pef/logreg/logreg.pef"
+
+

To use Slurm, create submit-logreg-job.sh with the following contents:

+
#!/bin/sh
+python logreg.py compile --pef-name="logreg" --output-folder="pef"
+python logreg.py run --pef="pef/logreg/logreg.pef"
+
+

Then

+
sbatch --output=pef/logreg/output.log submit-logreg-job.sh
+
+

The output, pef/logreg/output.log, will look something like this:

+
2023-03-08 21:18:25.168190: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
+To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
+2023-03-08 21:18:25.334389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
+2023-03-08 21:18:25.334430: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
+2023-03-08 21:18:26.422458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
+2023-03-08 21:18:26.422701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
+2023-03-08 21:18:26.422709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
+[Info][SAMBA]# Placing log files in /home/wilsonb/apps/starters/logreg/pef/logreg/logreg.samba.log
+[Info][MAC]# Placing log files in /home/wilsonb/apps/starters/logreg/pef/logreg/logreg.mac.log
+...
+
+Epoch [1/1], Step [10000/60000], Loss: 0.4642
+Epoch [1/1], Step [20000/60000], Loss: 0.4090
+Epoch [1/1], Step [30000/60000], Loss: 0.3863
+Epoch [1/1], Step [40000/60000], Loss: 0.3703
+Epoch [1/1], Step [50000/60000], Loss: 0.3633
+Epoch [1/1], Step [60000/60000], Loss: 0.3553
+Test Accuracy: 91.40  Loss: 0.3014
+2023-03-08T21:19:08 : [INFO][LIB][2688517]: sn_create_session: PEF File: pef/logreg/logreg.pef
+
+

UNet2D

+

The UNet application example is provided in the the path : /opt/sambaflow/apps/image/segmentation/. As any other application, we first compile and then train the model using compile and run arguments respectively. +The scripts containing the compile and run commands for UNet2D model can be accessed at Unet2d.sh or at /data/ANL/scripts/Unet2d.sh on any SN30 compute node.

+

Change directory and copy files.

+
mkdir -p ~/apps/image/unet
+cd ~/apps/image/unet
+
+

Copy and paste the contents of +Unet2d.sh +to a file with the same name into the current directory using your favorite editor.

+
chmod +x Unet2d.sh
+
+

Run these commands for training (compile + train):

+
./Unet2d.sh compile <image size> <batch_size> <num of instances> <RunID>
+./Unet2d.sh run <image size> <batch_size> <num of instances> <RunID>
+
+

The compile and run arguments of the script can only be run with number of instances equal to 1, indicating that this is a simple 4 tile run without data parallel framework. +For a image size of 256x256 and batch size 256 when running just 1 instance, the commands are provided as follows.

+
./Unet2d.sh compile 256 256 1 unet2d_single_compile
+./Unet2d.sh run 256 256 1 unet2d_single_run
+
+

The above commands displays the file that contains the output for the execution of the above scripts, usually /data/ANL/results/<hostname>/<userid>/<RunID>/Unet2d.out

+

If we inspect the compile and run commands for the UNet application provided in the script, we see that the application is compiled with --num-tiles 4, which means that the entire application fits on 4 tiles or half of a RDU. +The pef generated from the compilation process of the above command is placed under out/Unet2d/unet_train_256_256_single_4 inside the current working directory.

+
python ${UNET}/compile.py compile --mac-v2 --in-channels=3 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --num-tiles=4 --pef-name=unet_train_${BS}_${2}_single_${NUM_TILES} --output-folder=${OUTDIR}
+
+
srun --nodelist $(hostname) python /opt/sambaflow/apps/image/segmentation//hook.py run --data-cache=${CACHE_DIR}  --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling  --min-throughput 395 --in-channels=3 --in-width=${2} --in-height=${2} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${2}_${BS}_single_${NUM_TILES} --pef=${OUTDIR}/unet_train_${BS}_${2}_single_${NUM_TILES}/unet_train_${BS}_${2}_single_${NUM_TILES}.pef
+
+

The performance data is located at the bottom of log file.

+
inner train loop time : 374.6789753437042 for 10 epochs, number of global steps: 130, e2e samples_per_sec: 88.82270474202953
+
+

Gpt 1.5B

+

The Gpt 1.5B application example is provided in the the path : /opt/sambaflow/apps/nlp/transformers_on_rdu/. +The scripts containing the compile and run commands for Gpt1.5B model can be accessed at the path /data/ANL/scripts/Gpt1.5B_base_single_compile.sh and /data/ANL/scripts/Gpt1.5B_base_single_run.sh on any SN30 compute node. This script is compiled and run for only 1 instance and the model fits on 4 tiles or half of a RDU. The scripts are provided for reference.

+

Change directory and copy files.

+
mkdir -p ~/apps/nlp/Gpt1.5B_single
+cd ~/apps/nlp/Gpt1.5B_single
+
+

Copy and paste the contents of +Gpt1.5B_base_single_compile.sh and Gpt1.5B_base_single_run.sh +to a file with the same names into the current directory using your favorite editor.

+

or copy the contents from /data/ANL/scripts/Gpt1.5B_base_single_compile.sh and /data/ANL/scripts/Gpt1.5B_base_single_run.sh.

+
cp /data/ANL/scripts/Gpt1.5B_base_single_compile.sh ~/apps/nlp/Gpt1.5B_single/
+cp /data/ANL/scripts/Gpt1.5B_base_single_run.sh ~/apps/nlp/Gpt1.5B_single/
+
+

Run the script with batch size as an argument(shown below with an example of 32).

+
chmod +x Gpt1.5B_base_single_compile.sh 
+./Gpt1.5B_base_single_compile.sh 32
+
+

The Gpt1.5B_base_single_compile.sh script will internally call the Gpt1.5B_base_single_run.sh to perform the training. You can inspect the compile and run commands in the scripts to learn that this model trains with a batch size of 32 for 1 instance over 4 tiles. The human decision file and the compiler config file helps to optimize the compute and memory resources specific to this Gpt 1.5B model run.

+
python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --pef-name=GPT1.5B_base_single_32 --output-folder=/data/scratch/user/GPT1.5B_base_single_32 --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 32  --output_dir=/data/scratch/user/GPT1.5B_base_single_32/hf_gpt1dot5b_ss1k_gas_1_bs32  --overwrite_output_dir --do_train  --per_device_train_batch_size 32   --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_pardp2_lnsd.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt1dot5b_perf.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --weight_decay 0.1  --max_grad_norm_clip 1.0 --num-tiles 4 --enable-stochastic-rounding
+
+
COMMAND= /usr/local/bin/srun --mpi=pmi2 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run  -b 32  --data_dir /data/ANL/ss1024 --pef=/data/scratch/user/GPT1.5B_base_single_32/GPT1.5B_base_single_32/GPT1.5B_base_single_32.pef --output_dir=/data/scratch/user/GPT1.5B_base_single_32/hf_gpt1dot5b_ss1k_gas_1_bs16 --module_name gpt2_pretrain --task_name clm --max_seq_length 1024  --overwrite_output_dir --do_train  --per_device_train_batch_size 32 --tokenizer_name gpt2 --model_name gpt2 --non_split_head --skip_broadcast_patch --no_index_select_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --max_grad_norm_clip 1.0 --skip_checkpoint --logging_steps 1 --max_steps 75000 --learning_rate 0.00025 --steps_this_run 100
+
+

The sntilestat command shows that the application runs on 4 tiles as shown below.

+
/XRDU_0/RDU_0/TILE_0   2.1  96.9    0.8    0.1    0.0      0.0 796481  user python /opt/sambaflow/apps/nlp/transformers_on_rdu/
+/XRDU_0/RDU_0/TILE_1   2.1  96.9    0.8    0.1    0.0      0.0 796481  user python /opt/sambaflow/apps/nlp/transformers_on_rdu/
+/XRDU_0/RDU_0/TILE_2   2.5  96.9    0.4    0.1    0.0      0.0 796481  user python /opt/sambaflow/apps/nlp/transformers_on_rdu/
+/XRDU_0/RDU_0/TILE_3   2.5  96.9    0.4    0.1    0.0      0.0 796481  user python /opt/sambaflow/apps/nlp/transformers_on_rdu/
+/XRDU_0/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0
+...
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/files/BertLarge.sh b/ai-testbed/sambanova/files/BertLarge.sh new file mode 100644 index 0000000000..3755dbd09a --- /dev/null +++ b/ai-testbed/sambanova/files/BertLarge.sh @@ -0,0 +1,62 @@ +#! /bin/bash +set -e +#export SF_RNT_LOG_LEVEL=DEBUG +LOGDIR=`date +%m%d%y.%H` +if [ "$1" ] ; then +LOGDIR=$1 +fi +MODEL_NAME="BertLarge" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} + + +ACTIVATE=/opt/sambaflow/apps/nlp/transformers_on_rdu/venv/bin/activate +####################### +# Edit these variables. +####################### +export OMP_NUM_THREADS=18 +export CCL_TIMEOUT=3600 +####################### +# Start script timer +SECONDS=0 +# Temp file location +DIRECTORY=$$ +#OUTDIR=${HOME}/${DIRECTORY} +OUTDIR=${HOME}/${MODEL_NAME} + +source ${ACTIVATE} +cd ${HOME} +echo "Model: " ${MODEL_NAME} > ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 +if [ ! -d ${OUTDIR} ] ; then + mkdir ${OUTDIR} +fi +cd ${OUTDIR} +####################### +echo "Machine State Before: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +####################### +if [ -e ${OUTDIR}/bertlrg/bertlrg.pef ] ; then +rm ${OUTDIR}/bertlrg/bertlrg.pef +fi +if [ ! -e ${OUTDIR}/bertlrg/bertlrg.pef ] ; then + echo "COMPILE START AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 + ##Orig COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --model_name_or_path bert-large-uncased --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 --per_device_train_batch_size 256 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --cache_dir ${OUTDIR}/cache --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions/compiler_configs/compiler_configs_bertlarge_sc_mlm_ml_perf_fullfeature_macv2_gm_e2e.json --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions/mac_overrides/bertlarge_sc_training_mlm_ml_perf_fullfeature_macv2.json --mac-v2 --non_split_head --dense_adam --data-parallel -ws 2 --weight_decay 0.01 --max_grad_norm_clip 1.0 --adam_beta2 0.98 --num-tiles 4 --pef-name=bertlrg --output-folder=${OUTDIR}" + COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --model_name_or_path bert-large-uncased --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 --per_device_train_batch_size 256 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --cache_dir ${OUTDIR}/cache --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions/compiler_configs/compiler_configs_bertlarge_sc_mlm_ml_perf_fullfeature_macv2_gm_e2e.json --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions/mac_overrides/bertlarge_sc_training_mlm_ml_perf_fullfeature_macv2.json --mac-v2 --non_split_head --dense_adam --data-parallel -ws 2 --weight_decay 0.01 --max_grad_norm_clip 1.0 --adam_beta2 0.98 --num-tiles 4 --pef-name=bertlrg --output-folder=${OUTDIR}" + echo "COMPILE COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + echo "COMPILE END AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 +fi +####################### +echo "RUN" >> ${OUTPUT_PATH} 2>&1 +#COMMAND="/opt/mpich-3.4.3/bin/mpirun -np 16 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/modules/configs/mlm_24layer_ml_perf_config.json --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 256 --data_dir /data/ANL/wikicorpus_en/ --cache ${OUTDIR}/cache --max_predictions_per_seq 20 --warmup_steps 12500 --max_steps 250000 --steps_this_run 5005 --logging_steps 1 --weight_decay 0.01 --learning_rate 0.000175 --non_split_head --dense_adam --data-parallel --reduce-on-rdu --adam_beta2 0.98 --max_grad_norm_clip 1.0 --validate_stat_perf --validate_tying_plus_embed_train --min_throughput 570000 --max_throughput 620000 --skip_checkpoint -p ${OUTDIR}/bertlrg/bertlrg.pef" +#Orig COMMAND="/opt/mpich-3.4.3/bin/mpirun -np 16 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/modules/configs/mlm_24layer_ml_perf_config.json --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 256 --data_dir /data/ANL/wikicorpus_en/ --cache ${OUTDIR}/cache --max_predictions_per_seq 20 --warmup_steps 12500 --max_steps 250000 --steps_this_run 5005 --logging_steps 1 --weight_decay 0.01 --learning_rate 0.000175 --non_split_head --dense_adam --data-parallel --reduce-on-rdu --adam_beta2 0.98 --max_grad_norm_clip 1.0 --validate_stat_perf --validate_tying_plus_embed_train --min_throughput 570000 --max_throughput 620000 --skip_checkpoint -p ${OUTDIR}/bertlrg/bertlrg.pef" +COMMAND="/opt/mpich-3.4.3/bin/mpirun -np 16 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/modules/configs/mlm_24layer_ml_perf_config.json --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 256 --data_dir /data/ANL/wikicorpus_en/ --cache ${OUTDIR}/cache --max_predictions_per_seq 20 --warmup_steps 12500 --max_steps 250000 --steps_this_run 5005 --logging_steps 1 --weight_decay 0.01 --learning_rate 0.000175 --non_split_head --dense_adam --data-parallel --reduce-on-rdu --adam_beta2 0.98 --max_grad_norm_clip 1.0 --validate_stat_perf --validate_tying_plus_embed_train --min_throughput 570000 --max_throughput 620000 --skip_checkpoint -p ${OUTDIR}/bertlrg/bertlrg.pef" +echo "RUN COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 +eval $COMMAND >> ${OUTPUT_PATH} 2>&1 +####################### +echo "Machine state After: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/files/BertLarge_nvme.sh b/ai-testbed/sambanova/files/BertLarge_nvme.sh new file mode 100644 index 0000000000..dd7917745d --- /dev/null +++ b/ai-testbed/sambanova/files/BertLarge_nvme.sh @@ -0,0 +1,61 @@ +#! /bin/bash +set -e +#export SF_RNT_LOG_LEVEL=DEBUG +LOGDIR=`date +%m%d%y.%H` +if [ "$1" ] ; then +LOGDIR=$1 +fi +MODEL_NAME="BertLarge" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} + +ACTIVATE=/opt/sambaflow/apps/nlp/transformers_on_rdu/venv/bin/activate +####################### +# Edit these variables. +####################### +export OMP_NUM_THREADS=18 +export CCL_TIMEOUT=3600 +####################### +# Start script timer +SECONDS=0 +# Temp file location +DIRECTORY=$$ +#OUTDIR=${HOME}/${DIRECTORY} +OUTDIR=$(pwd)/${MODEL_NAME} + +source ${ACTIVATE} +cd ${HOME} +echo "Model: " ${MODEL_NAME} > ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 +if [ ! -d ${OUTDIR} ] ; then + mkdir ${OUTDIR} +fi +cd ${OUTDIR} +####################### +echo "Machine State Before: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +####################### +if [ -e ${OUTDIR}/bertlrg/bertlrg.pef ] ; then +rm ${OUTDIR}/bertlrg/bertlrg.pef +fi +if [ ! -e ${OUTDIR}/bertlrg/bertlrg.pef ] ; then + echo "COMPILE START AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 + ##Orig COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --model_name_or_path bert-large-uncased --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 --per_device_train_batch_size 256 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --cache_dir ${OUTDIR}/cache --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions/compiler_configs/compiler_configs_bertlarge_sc_mlm_ml_perf_fullfeature_macv2_gm_e2e.json --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions/mac_overrides/bertlarge_sc_training_mlm_ml_perf_fullfeature_macv2.json --mac-v2 --non_split_head --dense_adam --data-parallel -ws 2 --weight_decay 0.01 --max_grad_norm_clip 1.0 --adam_beta2 0.98 --num-tiles 4 --pef-name=bertlrg --output-folder=${OUTDIR}" + COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --model_name_or_path bert-large-uncased --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 --per_device_train_batch_size 256 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --cache_dir ${OUTDIR}/cache --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions/compiler_configs/compiler_configs_bertlarge_sc_mlm_ml_perf_fullfeature_macv2_gm_e2e.json --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions/mac_overrides/bertlarge_sc_training_mlm_ml_perf_fullfeature_macv2.json --mac-v2 --non_split_head --dense_adam --data-parallel -ws 2 --weight_decay 0.01 --max_grad_norm_clip 1.0 --adam_beta2 0.98 --num-tiles 4 --pef-name=bertlrg --output-folder=${OUTDIR} --log-level error" + echo "COMPILE COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + echo "COMPILE END AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 +fi +####################### +echo "RUN" >> ${OUTPUT_PATH} 2>&1 +#COMMAND="/opt/mpich-3.4.3/bin/mpirun -np 16 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/modules/configs/mlm_24layer_ml_perf_config.json --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 256 --data_dir /data/ANL/wikicorpus_en/ --cache ${OUTDIR}/cache --max_predictions_per_seq 20 --warmup_steps 12500 --max_steps 250000 --steps_this_run 5005 --logging_steps 1 --weight_decay 0.01 --learning_rate 0.000175 --non_split_head --dense_adam --data-parallel --reduce-on-rdu --adam_beta2 0.98 --max_grad_norm_clip 1.0 --validate_stat_perf --validate_tying_plus_embed_train --min_throughput 570000 --max_throughput 620000 --skip_checkpoint -p ${OUTDIR}/bertlrg/bertlrg.pef" +#Orig COMMAND="/opt/mpich-3.4.3/bin/mpirun -np 16 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/modules/configs/mlm_24layer_ml_perf_config.json --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 256 --data_dir /data/ANL/wikicorpus_en/ --cache ${OUTDIR}/cache --max_predictions_per_seq 20 --warmup_steps 12500 --max_steps 250000 --steps_this_run 5005 --logging_steps 1 --weight_decay 0.01 --learning_rate 0.000175 --non_split_head --dense_adam --data-parallel --reduce-on-rdu --adam_beta2 0.98 --max_grad_norm_clip 1.0 --validate_stat_perf --validate_tying_plus_embed_train --min_throughput 570000 --max_throughput 620000 --skip_checkpoint -p ${OUTDIR}/bertlrg/bertlrg.pef" +COMMAND="/opt/mpich-3.4.3/bin/mpirun -np 16 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/modules/configs/mlm_24layer_ml_perf_config.json --tokenizer_name bert-large-uncased --module_name mlm_ns --task_name mlm_ns --max_seq_length 128 -b 256 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 256 --data_dir /data/ANL/wikicorpus_en/ --cache ${OUTDIR}/cache --max_predictions_per_seq 20 --warmup_steps 12500 --max_steps 250000 --steps_this_run 5005 --logging_steps 1 --weight_decay 0.01 --learning_rate 0.000175 --non_split_head --dense_adam --data-parallel --reduce-on-rdu --adam_beta2 0.98 --max_grad_norm_clip 1.0 --validate_stat_perf --validate_tying_plus_embed_train --min_throughput 570000 --max_throughput 620000 --skip_checkpoint -p ${OUTDIR}/bertlrg/bertlrg.pef" +echo "RUN COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 +eval $COMMAND >> ${OUTPUT_PATH} 2>&1 +####################### +echo "Machine state After: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/files/Gpt1.5B_base_single_compile.sh b/ai-testbed/sambanova/files/Gpt1.5B_base_single_compile.sh new file mode 100644 index 0000000000..a9361fd55d --- /dev/null +++ b/ai-testbed/sambanova/files/Gpt1.5B_base_single_compile.sh @@ -0,0 +1,74 @@ +#! /bin/bash +set -e +export SOFTWARE_HOME=/opt +ACTIVATE=/opt/sambaflow/apps/nlp/transformers_on_rdu/venv/bin/activate +LOGDIR=`date +%m%d%y.%H` +if [ "$2" ] ; then +LOGDIR=$2 +fi +MODEL_NAME="GPT1.5B_base_single_$1" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} + +####################### +# Edit these variables. +####################### +export OMP_NUM_THREADS=18 +export REQUESTS_CA_BUNDLE=/usr/local/lib/python3.8/site-packages/certifi/cacert.pem +export CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt + +####################### +# Start script timer +SECONDS=0 +# Temp file location +DIRECTORY=$$ +OUTDIR=/data/scratch/${USER}/${MODEL_NAME} +mkdir -p ${OUTDIR} +source ${ACTIVATE} +echo "Model: " ${MODEL_NAME} > ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 +apt list --installed sambaflow >> ${OUTPUT_PATH} 2>&1 +cd ${OUTDIR} +####################### +echo "Machine State Before: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +####################### +export SN_NUM_THREADS=32 + +if [ $1 -eq 256 ] ; then + BATCH_SIZE=256 +elif [ $1 -eq 128 ] ; then + BATCH_SIZE=128 +elif [ $1 -eq 64 ] ; then + BATCH_SIZE=64 +elif [ $1 -eq 32 ] ; then + BATCH_SIZE=32 +elif [ $1 -eq 16 ] ; then + BATCH_SIZE=16 +else + echo "Batchsize $1 is invalid use 16,32,64,or 128,256" $2 >> ${OUTPUT_PATH} 2>&1 + exit 1 +fi + +if [ ! -e ${OUTDIR}/${MODEL_NAME}/${MODEL_NAME}.pef ] ; then + echo "COMPILE START AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 + export GAS=1 + + export CC=compiler_configs_gpt1dot5b_perf.json + #env | grep PYTHONPATH >> ${OUTPUT_PATH} 2>&1 + COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --pef-name=${MODEL_NAME} --output-folder=${OUTDIR} --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b $BATCH_SIZE --output_dir=${OUTDIR}/hf_gpt1dot5b_ss1k_gas_${GAS}_bs${BATCH_SIZE} --overwrite_output_dir --do_train --per_device_train_batch_size ${BATCH_SIZE} --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_pardp2_lnsd.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/$CC --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --weight_decay 0.1 --max_grad_norm_clip 1.0 --num-tiles 4 --enable-stochastic-rounding" + + + echo "COMPILE COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + echo "COMPILE END AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 +fi +####################### +echo "RUN" >> ${OUTPUT_PATH} 2>&1 +/usr/local/bin/sbatch --output=${HOME}/slurm-%A.out --ntasks 1 --gres=rdu:8 --ntasks-per-node 16 --nodes 1 --nodelist $(hostname) --cpus-per-task=8 /data/ANL/scripts/Gpt1.5B_base_single_run.sh $BATCH_SIZE $2 >> ${OUTPUT_PATH} 2>&1 + +echo "Machine state After: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/files/Gpt1.5B_base_single_run.sh b/ai-testbed/sambanova/files/Gpt1.5B_base_single_run.sh new file mode 100644 index 0000000000..132667cd16 --- /dev/null +++ b/ai-testbed/sambanova/files/Gpt1.5B_base_single_run.sh @@ -0,0 +1,55 @@ +#! /bin/bash +set -e +export SOFTWARE_HOME=/opt +LOGDIR=`date +%m%d%y.%H` +if [ "$2" ] ; then +LOGDIR=$2 +fi +MODEL_NAME="GPT1.5B_base_single_$1" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} + +ACTIVATE=/opt/sambaflow/apps/nlp/transformers_on_rdu/venv/bin/activate +####################### +# Edit these variables. +####################### +export OMP_NUM_THREADS=18 +####################### +# Start script timer +SECONDS=0 +# Temp file location +DIRECTORY=$$ +OUTDIR=/data/scratch/${USER}/${MODEL_NAME} +mkdir -p ${OUTDIR} +source ${ACTIVATE} +echo "Model: " ${MODEL_NAME} >> ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 +apt list --installed sambaflow >> ${OUTPUT_PATH} 2>&1 +cd ${OUTDIR} +####################### +echo "Machine State Before: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +####################### +if [ ! -e ${OUTDIR}/${MODEL_NAME}/${MODEL_NAME}.pef ] ; then + echo "PEF ${OUTDIR}/${MODEL_NAME}/${MODEL_NAME}.pef does not exist, exiting" >> ${OUTPUT_PATH} 2>&1 + exit 1 +fi + +####################### +echo "RUN" >> ${OUTPUT_PATH} 2>&1 +#export CCL_TIMEOUT=3600 +export REQUESTS_CA_BUNDLE=/usr/local/lib/python3.8/site-packages/certifi/cacert.pem +export CURL_CA_BUNDLE="/etc/ssl/certs/ca-certificates.crt" +export SAMBA_CCL_HIERARCHICAL_ALLREDUCE=1 + +COMMAND="/usr/local/bin/srun --mpi=pmi2 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run -b $1 --data_dir /data/ANL/ss1024 --pef=${OUTDIR}/${MODEL_NAME}/${MODEL_NAME}.pef --output_dir=${OUTDIR}/hf_gpt1dot5b_ss1k_gas_1_bs16 --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 --overwrite_output_dir --do_train --per_device_train_batch_size $1 --tokenizer_name gpt2 --model_name gpt2 --non_split_head --skip_broadcast_patch --no_index_select_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --max_grad_norm_clip 1.0 --skip_checkpoint --logging_steps 1 --max_steps 75000 --learning_rate 0.00025 --steps_this_run 100" >> ${OUTPUT_PATH} 2>&1 + +echo "COMMAND= $COMMAND" >> ${OUTPUT_PATH} 2>&1 +eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + +####################### +echo "Machine state After: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/files/Gpt1.5B_compile.sh b/ai-testbed/sambanova/files/Gpt1.5B_compile.sh new file mode 100644 index 0000000000..c16db26363 --- /dev/null +++ b/ai-testbed/sambanova/files/Gpt1.5B_compile.sh @@ -0,0 +1,52 @@ +#! /bin/bash +set -e +#export SF_RNT_LOG_LEVEL=DEBUG +ACTIVATE=/opt/sambaflow/apps/nlp/transformers_on_rdu/venv/bin/activate +LOGDIR=`date +%m%d%y.%H` +if [ "$1" ] ; then +LOGDIR=$1 +fi +MODEL_NAME="GPT1.5B" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} +export SN_NUM_THREADS=32 + +####################### +# Edit these variables. +####################### +export OMP_NUM_THREADS=18 +####################### +# Start script timer +SECONDS=0 +# Temp file location +DIRECTORY=$$ +OUTDIR=/data/scratch/${USER}/GPT_RUN +mkdir -p ${OUTDIR} +source ${ACTIVATE} +echo "Model: " ${MODEL_NAME} > ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 +apt list --installed sambaflow >> ${OUTPUT_PATH} 2>&1 +cd ${OUTDIR} +####################### +echo "Machine State Before: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +if [ ! -e ${OUTDIR}/gpt15/gpt15.pef ] ; then + ####################### + echo "COMPILE START AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 + + # 1.14.3-8 COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_anl.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nogroups.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --data-parallel -ws 2 --weight_decay 0.1 --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15 --output-folder=${OUTDIR}" + COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_lnsd.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nonpardp_norc_e2e.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --data-parallel -ws 2 --weight_decay 0.1 --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15 --output-folder=${OUTDIR}" + + echo "COMPILE COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + echo "COMPILE END AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 +fi +####################### +echo "RUN" >> ${OUTPUT_PATH} 2>&1 +/usr/local/bin/sbatch --output=${HOME}/slurm-%A.out --ntasks 32 --gres=rdu:1 --ntasks-per-node 16 --nodes 2 --cpus-per-task=8 /home/$(whoami)/nlp-multiNodetest/Gpt1.5B_run.sh ${1} >> ${OUTPUT_PATH} 2>&1 + +echo "Machine state After: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/files/Gpt1.5B_run.sh b/ai-testbed/sambanova/files/Gpt1.5B_run.sh new file mode 100644 index 0000000000..3ba46d5643 --- /dev/null +++ b/ai-testbed/sambanova/files/Gpt1.5B_run.sh @@ -0,0 +1,54 @@ +#! /bin/bash +set -e +#export SF_RNT_LOG_LEVEL=DEBUG +LOGDIR=`date +%m%d%y.%H` +if [ "$1" ] ; then +LOGDIR=$1 +fi +MODEL_NAME="Gpt1.5B" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} +export SN_NUM_THREADS=32 + +ACTIVATE=/opt/sambaflow/apps/nlp/transformers_on_rdu/venv/bin/activate +####################### +# Edit these variables. +####################### +export OMP_NUM_THREADS=18 +####################### +# Start script timer +SECONDS=0 +# Temp file location +DIRECTORY=$$ +OUTDIR=/data/scratch/${USER}/GPT_RUN +mkdir -p ${OUTDIR} +source ${ACTIVATE} +echo "Model: " ${MODEL_NAME} >> ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 +apt list --installed sambaflow >> ${OUTPUT_PATH} 2>&1 +cd ${OUTDIR} +####################### +echo "Machine State Before: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +####################### +if [ ! -e ${OUTDIR}/gpt15/gpt15.pef ] ; then + echo "COMPILE START AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 + + #1.14.3-8 COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_anl.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nogroups.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --data-parallel -ws 2 --weight_decay 0.1 --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15 --output-folder=${OUTDIR}" + COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_lnsd.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nogroups.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --data-parallel -ws 2 --weight_decay 0.1 --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15 --output-folder=${OUTDIR}" + + echo "COMPILE COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + echo "COMPILE END AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 +fi +####################### +echo "RUN" >> ${OUTPUT_PATH} 2>&1 +export CCL_TIMEOUT=3600 +/usr/local/bin/srun --mpi=pmi2 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run -b 16 --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --non_split_head --skip_broadcast_patch --no_index_select_patch --output_dir=${OUTDIR}/hf_output --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --max_grad_norm_clip 1.0 --skip_checkpoint --data-parallel --reduce-on-rdu --data_dir /data/ANL/ss1024 --data_dir /data/ANL/ss1024 --logging_steps 1 --max_steps 900000 --learning_rate 0.00025 --steps_this_run 800 --pef=${OUTDIR}/gpt15/gpt15.pef >> ${OUTPUT_PATH} 2>&1 + +####################### +echo "Machine state After: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/files/Gpt1.5B_single.sh b/ai-testbed/sambanova/files/Gpt1.5B_single.sh new file mode 100644 index 0000000000..6ef136c15c --- /dev/null +++ b/ai-testbed/sambanova/files/Gpt1.5B_single.sh @@ -0,0 +1,59 @@ +#! /bin/bash +set -e +#export SF_RNT_LOG_LEVEL=DEBUG +ACTIVATE=/opt/sambaflow/apps/nlp/transformers_on_rdu/venv/bin/activate +LOGDIR=`date +%m%d%y.%H` +if [ "$1" ] ; then +LOGDIR=$1 +fi +MODEL_NAME="GPT1.5B" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} +export SN_NUM_THREADS=32 + +####################### +# Edit these variables. +####################### +export OMP_NUM_THREADS=18 +####################### +# Start script timer +SECONDS=0 +# Temp file location +DIRECTORY=$$ +OUTDIR=/data/scratch/${USER}/GPT_RUN +mkdir -p ${OUTDIR} +source ${ACTIVATE} +echo "Model: " ${MODEL_NAME} > ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 +apt list --installed sambaflow >> ${OUTPUT_PATH} 2>&1 +cd ${OUTDIR} +####################### +echo "Machine State Before: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +which python >> ${OUTPUT_PATH} 2>&1 +python --version >> ${OUTPUT_PATH} 2>&1 +env >> ${OUTPUT_PATH} 2>&1 +if [ ! -e ${OUTDIR}/gpt15_single/gpt15_single.pef ] ; then + ####################### + echo "COMPILE START AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 + + # 1.14.3-8 COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_anl.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nogroups.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --data-parallel -ws 2 --weight_decay 0.1 --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15 --output-folder=${OUTDIR}" + COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_lnsd.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nonpardp_norc_e2e.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --weight_decay 0.1 --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15_single --output-folder=${OUTDIR}" + + echo "COMPILE COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + echo "COMPILE END AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 +fi +####################### +echo "RUN" >> ${OUTPUT_PATH} 2>&1 + +COMMAND="srun python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run -b 16 --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --non_split_head --skip_broadcast_patch --no_index_select_patch --output_dir=${OUTDIR}/hf_output --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --max_grad_norm_clip 1.0 --skip_checkpoint --data_dir /data/ANL/ss1024 --logging_steps 1 --max_steps 900000 --learning_rate 0.00025 --steps_this_run 100 --pef=${OUTDIR}/gpt15_single/gpt15_single.pef >> ${OUTPUT_PATH} 2>&1" + +echo "RUN COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 +eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + +echo "Machine state After: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/files/Log_in.png b/ai-testbed/sambanova/files/Log_in.png new file mode 100644 index 0000000000..95d83fb66a Binary files /dev/null and b/ai-testbed/sambanova/files/Log_in.png differ diff --git a/ai-testbed/sambanova/files/ST_UI.jpg b/ai-testbed/sambanova/files/ST_UI.jpg new file mode 100644 index 0000000000..1d48d82039 Binary files /dev/null and b/ai-testbed/sambanova/files/ST_UI.jpg differ diff --git a/ai-testbed/sambanova/files/ST_console.jpg b/ai-testbed/sambanova/files/ST_console.jpg new file mode 100644 index 0000000000..6c910ee969 Binary files /dev/null and b/ai-testbed/sambanova/files/ST_console.jpg differ diff --git a/ai-testbed/sambanova/files/Unet2d.sh b/ai-testbed/sambanova/files/Unet2d.sh new file mode 100644 index 0000000000..4b4cb2479c --- /dev/null +++ b/ai-testbed/sambanova/files/Unet2d.sh @@ -0,0 +1,130 @@ +#! /bin/bash +set -e +if [ $# -ne 5 ] ; then + echo $#, $1, $2, $3, $4, $5 + echo "Unet2d.sh {compile,pcompile,run,prun} Image_size Batch_size Number_of_instances RunID" + exit 1 +fi +LOGDIR=`date +%m%d%y.%H` +if [ "$5" ] ; then +LOGDIR=$5 +fi +MODEL_NAME="Unet2d" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} + +####################### +export OMP_NUM_THREADS=8 +####################### +# Start script timer +SECONDS=0 +# Temp file location +OUTDIR=$(pwd)/out/${MODEL_NAME} +if [ ! -e ${OUTDIR} ] ; then +mkdir -p ${OUTDIR} +fi +######################################## +SECONDS=0 +BS=$3 +NP=$4 +NUM_WORKERS=4 +NUM_TILES=4 +DS=/data/ANL/kaggle_3m +CACHE_DIR=/data/scratch/${USER}/kaggle_3m_${2} +if [ ! -d ${CACHE_DIR} ] ; then + mkdir -p ${CACHE_DIR} +fi +export OMP_NUM_THREADS=16 +if [ -e /opt/sambaflow/apps/image/segmentation/venv/bin/activate ] ; then + source /opt/sambaflow/apps/image/segmentation/venv/bin/activate + else + source /opt/sambaflow/venv/bin/activate +fi +if [ -e /opt/sambaflow/apps/image/unet ] ; then + UNET=/opt/sambaflow/apps/image/unet +elif [ -e /opt/sambaflow/apps/image/segmentation ] ; then + UNET=/opt/sambaflow/apps/image/segmentation/ +else + echo "Cannot find UNET" + exit +fi +HD=${2} +if [ ${HD} == "1024" ] ; then + HD=1k +elif [ ${HD} == "2048" ] ; then + HD=2k +elif [ ${HD} == "4096" ] ; then + HD=4k +fi + +echo "Model: UNET2d" >> ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 + +#rm -rf log_dir* + +if [ "${1}" == "compile" ] ; then + #compile loop + echo "COMPILE" >> ${OUTPUT_PATH} 2>&1 + if [ -e ${OUTDIR}/unet_train_${BS}_${2}_single/unet_train_${BS}_${2}_single_${NUM_TILES}.pef ] ; then + rm ${OUTDIR}/unet_train_${BS}_${2}_single/unet_train_${BS}_${2}_single_${NUM_TILES}.pef + fi + if [ -e ${UNET}/compile.py ] ; then + COMMAND="python ${UNET}/compile.py compile --mac-v2 --in-channels=3 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --num-tiles=4 --pef-name=unet_train_${BS}_${2}_single_${NUM_TILES} --output-folder=${OUTDIR}" + #1.15 python ${UNET}/compile.py compile -b ${BS} --num-classes 2 --num-flexible-classes -1 --in-channels=3 --init-features 32 --in-width=${2} --in-height=${2} --enable-conv-tiling --mac-v2 --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --mac-human-decision ${UNET}/jsons/hd_files/hd_unet_${HD}_depth2colb.json --enable-stoc-rounding --num-tiles ${NUM_TILES} --pef-name="unet_train_${BS}_${2}_single_${NUM_TILES}" > compile_${BS}_${2}_single_${NUM_TILES}.log 2>&1 + + else +#old + COMMAND="python ${UNET}/unet.py compile -b ${BS} --in-channels=${NUM_WORKERS} --in-width=${2} --in-height=${2} --enable-conv-tiling --mac-v2 --mac-human-decision ${UNET}/jsons/hd_files/hd_unet_${HD}_tgm.json --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --pef-name="unet_train_${BS}_${2}_single" > compile_${BS}_${2}_single.log 2>&1" + fi + echo $COMMAND >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + +elif [ "${1}" == "pcompile" ] ; then + #parallel + echo "Parallel compile" >> ${OUTPUT_PATH} 2>&1 + #BS=$((BS/NP)) + if [ -e ${OUTDIR}/unet_train_${BS}_${2}_NP_${NUM_TILES}/unet_train_${BS}_${2}_NP_${NUM_TILES}.pef ] ; then + rm ${OUTDIR}/unet_train_${BS}_${2}_NP_${NUM_TILES}/unet_train_${BS}_${2}_NP_${NUM_TILES}.pef + fi + if [ -e ${UNET}/hook.py ] ; then + #python ${UNET}/compile.py compile -b ${BS} --num-classes 2 --num-flexible-classes -1 --in-channels=3 --init-features 32 --in-width=${2} --in-height=${2} --enable-conv-tiling --mac-v2 --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --mac-human-decision ${UNET}/jsons/hd_files/hd_unet_${HD}_depth2colb.json --enable-stoc-rounding --num-tiles ${NUM_TILES} --pef-name="unet_train_${BS}_${2}_NP_${NUM_TILES}" --data-parallel -ws 2 > compile_${BS}_${2}_NP_${NUM_TILES}.log 2>&1 +#1.16.2 + COMMAND="python /opt/sambaflow/apps/image/segmentation/compile.py compile --mac-v2 --in-channels=3 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --num-tiles=4 --pef-name=unet_train_${BS}_${2}_NP_${NUM_TILES} --data-parallel -ws 2 --output-folder=${OUTDIR}" + else + COMMAND="python ${UNET}/unet.py compile -b ${BS} --in-channels=${NUM_WORKERS} --in-width=${2} --in-height=${2} --enable-conv-tiling --mac-v2 --mac-human-decision ${UNET}/jsons/hd_files/hd_unet_${HD}_tgm.json --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --pef-name=unet_train_${BS}_${2}_NP --data-parallel -ws 2 --output-folder=${OUTDIR}" + fi + echo $COMMAND >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + +elif [ "${1}" == "run" ] ; then + #single + echo "RUN" >> ${OUTPUT_PATH} 2>&1 + export OMP_NUM_THREADS=16 + export SF_RNT_NUMA_BIND=2 + export SF_RNT_FSM_POLL_BUSY_WAIT=1 + export SF_RNT_DMA_POLL_BUSY_WAIT=1 + #run single + if [ -e ${UNET}/hook.py ] ; then + #orig srun --nodelist $(hostname) python ${UNET}/hook.py run --data-transform-config /opt/sambaflow/apps/image/segmentation/segmentation/datasets/data_transforms_config.yaml --data-cache-dir ${CACHE_DIR} --num-workers=${NUM_WORKERS} --mode train --in-channels=3 --in-width=${2} --in-height=${2} --init-features 32 -b ${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${2}_${3} --pef=$(pwd)/out/unet_train_${BS}_${2}_single/unet_train_${BS}_${2}_single.pef > run_unet_${BS}_${2}_16_sl.log 2>&1 + COMMAND="srun --nodelist $(hostname) python /opt/sambaflow/apps/image/segmentation//hook.py run --data-cache=${CACHE_DIR} --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling --min-throughput 395 --in-channels=3 --in-width=${2} --in-height=${2} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${2}_${BS}_single_${NUM_TILES} --pef=${OUTDIR}/unet_train_${BS}_${2}_single_${NUM_TILES}/unet_train_${BS}_${2}_single_${NUM_TILES}.pef" + + else + COMMAND="srun --nodelist $(hostname) python ${UNET}/unet_hook.py run --num-workers=${NUM_WORKERS} --do-train --in-channels=3 --in-width=${2} --in-height=${2} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${2}_${3} --pef=${OUTDIR}/unet_train_${BS}_${2}_single/unet_train_${BS}_${2}_single.pef --use-sambaloader" + fi + echo $COMMAND >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + #end run single + +elif [ "${1}" == "prun" ] ; then + #Parallel + #BS=$((BS/NP)) + echo "PRUN" >> ${OUTPUT_PATH} 2>&1 + echo "NP=${NP}" >> ${OUTPUT_PATH} 2>&1 + cpus=$((128/NP)) + COMMAND="sbatch --gres=rdu:1 --tasks-per-node ${NP} --nodes 1 --nodelist $(hostname) --cpus-per-task=${cpus} $(pwd)/unet_batch.sh ${NP} ${NUM_WORKERS} ${BS} ${2} ${5}" + echo $COMMAND >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 +fi +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/files/ffn_mnist_compile_run.sh b/ai-testbed/sambanova/files/ffn_mnist_compile_run.sh new file mode 100644 index 0000000000..e55089ba01 --- /dev/null +++ b/ai-testbed/sambanova/files/ffn_mnist_compile_run.sh @@ -0,0 +1,10 @@ +#!/bin/bash +SECONDS=0 +source /opt/sambaflow/apps/starters/ffn_mnist/venv/bin/activate +FFN=/opt/sambaflow/apps/starters/ffn_mnist +if [ "${1}" == "compile" ] ; then +srun python ${FFN}/ffn_mnist.py compile -b 1 --pef-name="ffn_mnist" --debug --mac-v2 +elif [ "${1}" == "run" ] ; then +srun python ${FFN}/ffn_mnist.py run -b 1 -p out/ffn_mnist/ffn_mnist.pef --debug +fi +echo "DURATION:" $SECONDS diff --git a/ai-testbed/sambanova/files/sambanova_login.jpg b/ai-testbed/sambanova/files/sambanova_login.jpg new file mode 100644 index 0000000000..2942af7985 Binary files /dev/null and b/ai-testbed/sambanova/files/sambanova_login.jpg differ diff --git a/ai-testbed/sambanova/files/unet_batch.sh b/ai-testbed/sambanova/files/unet_batch.sh new file mode 100644 index 0000000000..88c8b1157b --- /dev/null +++ b/ai-testbed/sambanova/files/unet_batch.sh @@ -0,0 +1,71 @@ +#! /bin/bash +set -e +LOGDIR=`date +%m%d%y.%H` +if [ "$5" ] ; then +LOGDIR=$5 +fi +MODEL_NAME="Unet2d" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} + +####################### +export OMP_NUM_THREADS=8 +####################### +# Start script timer +SECONDS=0 +# Temp file location +OUTDIR=$(pwd)/out/${MODEL_NAME} + +BS=$3 +IM=$4 +DS=/data/ANL/kaggle_3m +CACHE_DIR=/data/scratch/${USER}/kaggle_3m_${IM} +if [ ! -d ${CACHE_DIR} ] ; then + mkdir -p ${CACHE_DIR} +fi +NUM_WORKERS=${2} +NP=${1} +export OMP_NUM_THREADS=16 + +if [ -e /opt/sambaflow/apps/image/segmentation/venv/bin/activate ] ; then +source /opt/sambaflow/apps/image/segmentation/venv/bin/activate +else +source /opt/sambaflow/venv/bin/activate +fi + +if [ -e /opt/sambaflow/apps/image/unet ] ; then + UNET=/opt/sambaflow/apps/image/unet +elif [ -e /opt/sambaflow/apps/image/segmentation ] ; then + UNET=/opt/sambaflow/apps/image/segmentation/ +else + echo "Cannot find UNET" + exit +fi + +echo "Model: ${MODEL_NAME} " > ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 + +# export SAMBA_CCL_USE_PCIE_TRANSPORT=0 + export SF_RNT_NUMA_BIND=2 + export SF_RNT_FSM_POLL_BUSY_WAIT=1 + export SF_RNT_DMA_POLL_BUSY_WAIT=1 +# export SF_RNT_LOG_LEVEL=DEBUG + #export CCL_TIMEOUT=30 + #export SF_RNT_TILE_AFFINITY=0xf0000000 + export SAMBA_CCL_USE_PCIE_TRANSPORT=1 + export SF_RNT_NUMA_BIND=2 + export SF_RNT_FSM_POLL_BUSY_WAIT=1 + export SF_RNT_DMA_POLL_BUSY_WAIT=1 + rm -rf log_dir_unet_${NP}_train_kaggle + if [ -e ${UNET}/hook.py ] ; then + #orig srun --mpi=pmi2 python ${UNET}/hook.py run --data-cache-dir ${CACHE_DIR} --num-workers=${NUM_WORKERS} --mode train --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --pef=$(pwd)/out/unet_train_${BS}_${IM}_NP/unet_train_${BS}_${IM}_NP.pef --data-parallel --reduce-on-rdu --use-sambaloader > run_unet_${BS}_${IM}_${NP}.log 2>&1 + #1.15.2 srun --mpi=pmi2 python ${UNET}/hook.py run --data-in-memory --data-cache=${CACHE_DIR} --num-workers=${NUM_WORKERS} --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --data-parallel --reduce-on-rdu --pef=$(pwd)/out/unet_train_${BS}_${IM}_NP_4/unet_train_${BS}_${IM}_NP_4.pef > run_unet_${BS}_${IM}_${NP}_4.log 2>&1 + COMMAND="srun --mpi=pmi2 python /opt/sambaflow/apps/image/segmentation//hook.py run --data-cache=${CACHE_DIR} --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling --min-throughput 395 --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --data-parallel --reduce-on-rdu --pef=${OUTDIR}/unet_train_${BS}_${IM}_NP_4/unet_train_${BS}_${IM}_NP_4.pef" + else + COMMAND="srun --mpi=pmi2 python ${UNET}/unet_hook.py run --data-cache-dir ${CACHE_DIR} --num-workers=${NUM_WORKERS} --do-train --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --pef=$(pwd)/out/unet_train_${BS}_${IM}_NP/unet_train_${BS}_${IM}_NP.pef --data-parallel --reduce-on-rdu --use-sambaloader > run_unet_${BS}_${IM}_${NP}.log 2>&1" + fi +echo $COMMAND >> ${OUTPUT_PATH} 2>&1 +eval $COMMAND >> ${OUTPUT_PATH} 2>&1 +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/getting-started/index.html b/ai-testbed/sambanova/getting-started/index.html new file mode 100644 index 0000000000..9407d8ee74 --- /dev/null +++ b/ai-testbed/sambanova/getting-started/index.html @@ -0,0 +1,6871 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Getting Started - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Getting Started

+

On-Boarding

+

SambaNova SN30 can be accessed using your ALCF account. See Get Started +to request an account and for additional information.

+

Setup

+

System View

+

Connection to a SambaNova node is a two-step process. The first step is to ssh to the login node. +This step requires an MFA passcode for authentication - an +eight-digit passcode generated by an app on your mobile device, e.g., MobilePASS+. +The second step is to log in to a SambaNova node from the login node.

+

SambaNova System View

+

Log in to Login Node

+

Log in to the SambaNova login node from your local machine using the below command. This uses the MobilePASS+ token generated every time you log in to the system. This is the same passcode used to authenticate into other ALCF systems, such as Polaris, Theta and Cooley.

+

In the examples below, replace ALCFUserID with your ALCF user id.

+
ssh ALCFUserID@sambanova.alcf.anl.gov
+Password: < MobilePASS+ code >
+
+
+

Note: Use the ssh "-v" option in order to debug any ssh problems.

+
+

Log in to a SambaNova Node

+

Once you are on the login node, a SambaNova node can be accessed using an alias, sn30-r[1-4]-h[1-2] where 'r' stands for the rack number, and 'h' stands for host. sn30-r1-h1 is the first host of the first rack.

+

The 8 nodes are aliased as : sn30-r1-h1 , sn30-r1-h2, sn30-r2-h1, sn30-r2-h2, sn30-r3-h1, sn30-r3-h2, sn30-r4-h1, sn30-r4-h2.

+

sn30-r1-h1 can be accessed as below.

+
ssh sn30-r1-h1
+
+

SDK setup

+

The required software environment (SambaFlow software stack and the associated environmental variables) for a SN30 node is set up automatically at login. This is unlike the SN10 where the environment had to be set up by each user.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/job-queuing-and-submission/index.html b/ai-testbed/sambanova/job-queuing-and-submission/index.html new file mode 100644 index 0000000000..7df790e120 --- /dev/null +++ b/ai-testbed/sambanova/job-queuing-and-submission/index.html @@ -0,0 +1,6883 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Job Queuing and Submission - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Job Queueing and Submission

+

Introduction

+

SambaNova uses Slurm for job submission and queueing. Below are some of the important commands for using Slurm. For more information refer to Slurm Documentation.

+
+

Note: Run the Python scripts using 'srun' or 'sbatch', to ensure that concurrent jobs do not interfere with each other.

+

Note: There is just one scheduler for all of the SambaNova nodes.

+
+

SRun

+

The Slurm command srun can be used to run individual Python scripts in parallel with other scripts on a cluster managed by Slurm. Examples of srun usage are shown below.

+

Slurm will assign a nodelist/host to run a job if a host is not specified.

+

Example:

+
srun python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
+srun python lenet.py run --pef="pef/lenet/lenet.pef"
+
+

You may specify which node/host on which to run a job.

+

Reasons to specify a node list:

+
    +
  • One wants to test a specific node to verify the function of the HW and SW (daily smoke tests do this)
  • +
  • The nodes are at different software levels and one wants to use a node that has the needed software level for one's application.
  • +
+

Example:

+
srun --nodelist=sn30-r1-h1 python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
+
+

SBatch

+

Alternatively, these jobs can be submitted to the Slurm workload manager through a batch script by using the sbatch command. To do this, create a bash script (submit-lenet-job.sh here as an example) with the commands that you want to execute.

+
#!/bin/sh
+
+python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
+python lenet.py run --pef="pef/lenet/lenet.pef"
+
+

Then pass the bash script as an input to the sbatch command as shown below.

+
sbatch --output=pef/lenet/output.log submit-lenet-job.sh
+
+

In case of the need to use multiple RDUs (2 in the example shown below), the sbatch command would be altered as:

+
sbatch --gres=rdu:2 <your_script.sh>
+
+ + +

SQueue

+

The squeue command provides information about jobs located in the Slurm scheduling queue.

+
squeue
+
+

SInfo

+

SInfo is used to view partition and node information for a system running Slurm.

+

Here is a suggested command:

+
sinfo -O AllocNodes, GresUsed, Gres, NodeList
+
+

For more information, see SInfo.

+

SCancel

+

SCancel is used to signal or cancel jobs, job arrays, or job steps.

+
scancel job_id
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/miscellaneous/index.html b/ai-testbed/sambanova/miscellaneous/index.html new file mode 100644 index 0000000000..4586532b3c --- /dev/null +++ b/ai-testbed/sambanova/miscellaneous/index.html @@ -0,0 +1,7064 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Miscellaneous - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Miscellaneous

+

SDK Version

+

To find the SDK version, run the following commands

+
# TODO
+(venv) ALCFUserID@sn30-r1-h1:~$ python
+Python 3.7.6 (default, Feb 18 2020, 21:28:31)
+[GCC 9.3.0] on linux
+Type "help", "copyright", "credits" or "license" for more information.
+>>> import sambaflow
+>>> sambaflow.__version__
+'1.11.5'
+>>>
+
+

OMP_NUM_THREADS

+

The OMP_NUM_THREADS environment variable sets the number of threads to use for parallel regions.

+

The value of this environment variable must be a list of positive integer values. The values of the list set the number of threads to use for parallel regions at the corresponding nested levels.

+

For the SambaNova system it, is usually set to one.

+
export OMP_NUM_THREADS=16
+
+

Where is the Model?

+

Two copies of the model are maintained. One in host CPU memory and one in RDU +memory. They do not interfere with each other unless you explicitly sync +the model/parameter in between using:

+
SambaTensor.rdu() # Moves the CPU model to the RDU
+SambaTensor.cpu() # Moves the RDU model to the CPU
+
+

In order to run the model on the CPU, you can simply use the PyTorch model +as if there is no RDU. +In order to run the model on RDU, you would need to use session.run().

+

Useful Commands

+

SN Configuration

+
snconfig show Node static
+
+

The snconfig utility shows the static configuration of the system. The configuration for the first node is as follows:

+
======================================================
+=======                NODE Info               =======
+======================================================
+=======                Static Info             =======
+Timestamp: 2023-03-16 17:00:04
+Platform Name: DataScale SN30-8
+Node Name: NODE
+    Number of XRDUS: 4
+    XRDU Name: XRDU_0
+        Number of RDUS: 2
+        RDU name: RDU_0
+            Serial Number     : 205057B469B35895
+            Number of TILES: 8
+            TILE Name: TILE_0
+                Serial Number     : N/A
+            TILE Name: TILE_1
+                Serial Number     : N/A
+
+
+...
+
+
+                    Size              : 128.0 GB
+                    Serial Number     : 1F5BC22
+            DDR CH Name: DDRCH_6
+                Number of DIMMS: 1
+                DIMM Name: DIMM_L0
+                    Size              : 128.0 GB
+                    Serial Number     : 1F5BC99
+            DDR CH Name: DDRCH_7
+                Number of DIMMS: 1
+                DIMM Name: DIMM_M0
+                    Size              : 128.0 GB
+                    Serial Number     : 1F5BB68
+        Total XRDU_3 memory size (GB): 2048.0
+
+

SambaNova Daemon Service

+

The following command checks if the SambaNova daemon service is running.

+
systemctl status snd
+
+

The output should look something like this:

+
● snd.service - SN Devices Service
+     Loaded: loaded (/lib/systemd/system/snd.service; enabled; vendor preset: enabled)
+    Drop-In: /etc/systemd/system/snd.service.d
+             └─override.conf
+     Active: active (running) since Fri 2023-01-27 04:03:14 UTC; 1 months 18 days ago
+   Main PID: 5635 (snd)
+      Tasks: 9 (limit: 629145)
+     Memory: 156.8M
+     CGroup: /system.slice/snd.service
+             └─5635 /opt/sambaflow/bin/snd
+
+Warning: some journal files were not opened due to insufficient permissions.
+
+

Tile status

+
sntilestat
+watch sntilestat
+
+

The output shown below is when the system is completely idle.

+
TILE                 %idle %exec %pload %aload %chkpt %quiesce    PID     USER COMMAND
+/XRDU_0/RDU_0/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_0/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_0/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_0/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_0/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_0/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_1/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_1/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_1/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_1/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_1/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_1/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_1/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_0/RDU_1/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_0/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_0/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_0/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_0/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_0/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_0/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_1/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_1/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_1/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_1/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_1/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_1/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_1/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_1/RDU_1/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_0/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_0/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_0/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_0/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_0/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_0/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_1/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_1/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_1/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_1/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_1/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_1/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_1/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_2/RDU_1/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_0/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_0/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_0/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_0/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_0/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_0/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_1/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_1/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_1/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_1/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_1/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_1/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_1/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0
+/XRDU_3/RDU_1/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0
+
+

Finding Hung Tiles

+
snconfig show Node dynamic | grep perfect
+
+

How busy is the system?

+

Use one of

+
top
+htop
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/running-a-model-or-program/index.html b/ai-testbed/sambanova/running-a-model-or-program/index.html new file mode 100644 index 0000000000..d8eb6e904f --- /dev/null +++ b/ai-testbed/sambanova/running-a-model-or-program/index.html @@ -0,0 +1,6858 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Running a Model/Program - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Running a Model/Program

+
+

Note: Please be mindful of how you are using the system. +For example, consider running larger jobs in the evening or on weekends

+

Note: Please use only Slurm commands, i.e., srun and sbatch, to run your code. +If you run your code directly using the 'python' command, it may cause conflicts +on the system.

+

Note: If you have conda installed and a conda environment is active, you will see something like (base) at the beginning of the command prompt. If so, you will need to deactivate it with conda deactivate. Conda is not used on the SambaNova SN30 cluster.

+
+

Introduction

+

The SambaNova workflow includes the following main steps to run a model.

+
    +
  1. Compile
  2. +
  3. Run
  4. +
  5. Test (optional)
  6. +
+

The system uses the Slurm job +scheduler to schedule the jobs and manage the workload on the system. For more information on Slurm, see Job Queueing and Submission.

+

Example Programs lists the different example applications with corresponding commands for each of the above steps.

+

Compile

+

Compiles the model and generates a .pef file. This file contains +information on how to reconfigure the hardware, and map the compute and +memory resources required to run an application on RDUs. +The pef files are by default saved in the 'out' directory; the +SambaNova documentation advises saving pef files in separate +directories with the '--output-folder' option.

+

It is necessary to re-compile only when the model changes, or parameters specific to the model graph change, including the batch size.

+

Compile times can be significant. Compiling the UNet sample, for example, when using images of size 32x32 pixels, takes 358(s), and 1844(s) for images of size 256x256.

+

The entire compile process is executed on the host and no RDUs are involved in the compile step.

+

Example of compiling the LeNet application:

+
srun python lenet.py compile -b=1 --pef-name="lenet" --output-folder="pef"
+
+

where

+ + + + + + + + + + + + + + + + + + + + +
ArgumentDefaultHelp
-b1Batch size for training
+

Run

+

As part of this step, the model is trained on the RDUs by passing in the PEF file and the training dataset. The location of the pef file generated in the compile step is passed as an argument to the run command. Below is the example of the run command that trains a LeNet model.

+
srun python lenet.py run --pef="pef/lenet/lenet.pef"
+
+

The location of the pef file generated in the compile step is passed as an argument to the run command.

+

Test (Optional)

+

This command is used to run the model on both the host CPU and a SambaNova RDU. It compares the results from the CPU and RDU and will report if any discrepancies are found. Pass the pef file generated as part of the compile step as the input to this command.

+
srun python lenet.py test --pef="pef/lenet/lenet.pef"
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/sambatune/index.html b/ai-testbed/sambanova/sambatune/index.html new file mode 100644 index 0000000000..1178937af5 --- /dev/null +++ b/ai-testbed/sambanova/sambatune/index.html @@ -0,0 +1,6786 @@ + + + + + + + + + + + + + + + + + + + + + + + + + SambaTune for profiling and performance tuning - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Profiling and performance tuning with SambaTune

+

This section covers how to use the SambaTune profiling performance tuning tool, and the SambaTune UI for viewing the results.

+

+

SambaTune uses a yaml file that describes how to profile an application.
+There are samples in /opt/sambaflow/sambatune/configs.
+This section shows how to run the simplest sample, a linear net.

+

First, ssh into one of the nodes in the SN30 cluster.
+Next, start a slurm interative job reserving a full node (8 RDUs), for 8 hours (480 minutes): +

$ /usr/local/bin/srun --time=480 --gres=rdu:8 --pty bash
+
+Record the hostname: +
$ hostname
+sn30-r1-h1
+

+

Next, set an environment variable indicating where the profiling information should be stored: +

export DUMP_ROOT=~/Sambatune
+

+

If running a large model, the profiling information can be hundreds of gigabytes or more, and the DUMP_ROOT should be set to some location with more storage than your home directory (which has a quota).
+E.g. somewhere that you have write access to in /srv/projects

+

Optionally, examine the sample yaml file. You will see that it has 5 top-level sections: app:, model-args:, compile-args:, run-args:, env:

+

Next, run sambatune using a sample sambatune yaml configuration file. This sample command line requests profiling with the benchmark, instrument, and run modes. +

$ sambatune --modes benchmark instrument run -- /opt/sambaflow/sambatune/configs/linear_net.yaml
+

+

This will take a while to run, particularly if the yaml for a larger model is used.

+

Then, run sambatune_ui: +

$ export ST_PORT=8576
+$ sambatune_ui --directory $DUMP_ROOT/artifact_root/sambatune_gen --port $ST_PORT
+

+

Copy the password shown (e.g. to your clipboard). The userid is always admin. The password is different for every sambatune_ui run.

+

In a fresh console on your working machine where you will run the browser, set up a two-hop ssh tunnel to the target node. Replace the ALCFUserID in the ssh command line with your ALCF userid. +

$ export ST_PORT=8576
+$ ssh -L $ST_PORT:localhost:$ST_PORT ALCFUserID@sambanova.alcf.anl.gov  -t ssh -L $ST_PORT:localhost:$ST_PORT -N sn30-r1-h1
+

+

Put localhost:8576 in the url bar of a Chrome-family browser. (Chrome, Brave, Vivaldi, Opera tested.)
+A login prompt for the sambatune ui should show.
+Enter admin and the password copied previously.
+You should now see the SambaTune UI.

+

If the browser does not show a login prompt, or if any previous step complains about a port conflict, try another value for ST_PORT on both the target node and for the ssh tunnel command, e.g. 8577.

+

See SambaNova's SambaTune documentation for more information about using SambaTune and the SambaTune UI.
+This section is a good starting point: Workflow overview

+

When finished:
+- Break the ssh tunnel with ctrl-c or equivalent.
+- Stop the sambatune_ui server on the target node with ctrl-c or equivalent.
+- Exit the interactive slurm job to release the reserved resources.

+

A disconnected job can be canceled by determining its job id with squeue -a and canceling the job with scancel <jobid>

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/system-overview/index.html b/ai-testbed/sambanova/system-overview/index.html new file mode 100644 index 0000000000..1d8af6fb16 --- /dev/null +++ b/ai-testbed/sambanova/system-overview/index.html @@ -0,0 +1,6747 @@ + + + + + + + + + + + + + + + + + + + + + + + + + System Overview - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

System Overview

+

Introduction

+

The SambaNova DataScale SN30 system is architected around the next-generation Reconfigurable Dataflow Unit (RDU) processor for optimal dataflow processing and acceleration. The AI Testbed's SambaNova SN30 system consists of eight nodes in 4 full racks, each node featuring eight RDUs interconnected to enable model and data parallelism. SambaFlow, Sambanova's software stack, extracts, optimizes, and maps the dataflow graphs to the RDUs from standard machine learning frameworks like PyTorch.

+

Below are some of the links to SambaNova documentation.

+

SambaNova white paper: Accelerated Computing with a Reconfigurable Dataflow Architecture

+

SN30 documentation: SambaNova Documentation

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/tunneling-and-forwarding-ports/index.html b/ai-testbed/sambanova/tunneling-and-forwarding-ports/index.html new file mode 100644 index 0000000000..2d787afb47 --- /dev/null +++ b/ai-testbed/sambanova/tunneling-and-forwarding-ports/index.html @@ -0,0 +1,6925 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Tunneling and Forwarding Ports - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Tunneling and Forwarding Ports

+

Port forwarding is covered here. This is specifically for TensorBoard.

+

TensorBoard Port Forwarding

+

This section describes the steps to be followed to set up port forwarding for applications, +like TensorBoard, which runs on the SambaNova system and binds to one or more ports. +This example uses 6006 and 16006 as port numbers. Using port numbers other than these may +avoid collisions with other users.

+

From Your Local Machine

+

Replace ALCFUserID with your ALCF User ID.

+

Run

+
# Forward a port number from sambanova.alcf.anl.gov to your local machine.
+ssh -v -N -f -L localhost:16006:localhost:16006 ALCFUserID@sambanova.alcf.anl.gov
+...
+Password: < MobilePass+ code >
+
+# Connect to sambanova.alcf.anl.gov
+ssh ALCFUserID@sambanova.alcf.anl.gov
+...
+Password: < MobilePass+ code >
+
+

From sambanova.alcf.anl.gov

+

Below are the commands specific to sn30-r1-h1. You may replace sn30-r1-h1 with any other node when using the appropriate system.

+

Run

+
+

Note: The full name is sn30-r1-h1.ai.alcf.anl.gov and it may also be used.

+
+
# Forward the port.
+ssh -N -f -L localhost:16006:localhost:6006 ALCFUserID@sn30-r1-h1
+# Connect to the system.
+ssh ALCFUserID@sn30-r1-h1
+
+

On sn30-r1-h1

+

Activate the venv appropriate to your project.

+

Navigate to the appropriate directory for your model. +Launch your model using srun or sbatch.

+
cd /path/to/your/project
+sbatch --output=pef/my_model/output.log submit-my_model-job.sh
+
+

On Another sn30-r1-h1 Terminal Window

+

The SambaNova system has a bash shell script to setup the required software environment. +This sets up the SambaFlow software stack, the associated environmental variables and activates +a pre-configured virtual environment.

+

Use the command appropriate for your environment.

+

For example, if you are using LogReg:

+
ALCFUserID@sn30-r1-h1:~$ source /opt/sambaflow/apps/starters/logreg/venv/bin/activate
+(venv) ALCFUserID@sn30-r1-h1:~$
+
+

Navigate to the appropriate directory for your model.

+
cd /path/to/your/project
+tensorboard --logdir /logs --port 6006
+
+

Browser on Local Machine

+

Then, navigate in your browser to, in this example, http://localhost:16006 on your local machine.

+

Notes

+

Explanation of ssh command:

+
-N : no remote commands
+
+-f : put ssh in the background
+
+-L <machine1>:<portA>:<machine2>:<portB> :
+
+The full command line will forward <machine2>:<portB> (remote scope) to <machine1>:<portA> (local scope)
+
+

Adapted from: How can I run Tensorboard on a remote server?

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/unused/cosmictagger-conversion/index.html b/ai-testbed/sambanova/unused/cosmictagger-conversion/index.html new file mode 100644 index 0000000000..58d4db39c4 --- /dev/null +++ b/ai-testbed/sambanova/unused/cosmictagger-conversion/index.html @@ -0,0 +1,7100 @@ + + + + + + + + + + + + + + + + + + + + + CosmicTagger Conversion - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

CosmicTagger Conversion

+

The intent of this page is to show conceptually how to convert a model to run on the SambaNova system. +It is not necessary to convert CosmicTagger because it has already been converted and is +located at CosmicTagger on the SambaNova branch. +The original is located at CosmicTagger.

+

Run Model on CPU

+

The first step to converting a model is to verify that it runs on the CPU. This step has been verified for CosmicTagger.

+

Config.py

+

CosmicTagger can run on multiple machines. As such, it is necessary to specify the architecture +that one is using. For example, CPU or GPU. The architecture is stored in the +ComputeMode class.

+

Edit src/config/config.py. Add RDU to the ComputeMode class.

+
class ComputeMode(Enum):
+    CPU   = 0
+    #...
+    RDU   = 6
+
+

Trainer.py

+

Edit src/utils/torch/trainer.py.

+

Import SambaNova Packages

+

Insert the imports at the top of the file.

+

SambaFlow is a complete software stack designed to take input from standard machine learning frameworks such as PyTorch and TensorFlow. SambaFlow automatically extracts, optimizes, and maps dataflow graphs onto RDUs.

+
try:
+    from sambaflow import samba
+
+    import sambaflow.samba.utils as utils
+    from sambaflow.samba.utils.argparser import parse_app_args
+    from sambaflow.samba.utils.common import common_app_driver
+except:
+    pass
+
+

Wrap Model

+

Wrap the model using poptorch.trainingModel() so that it may be ran on IPUs for training.

+

Wrap the model using poptorch.inferenceModel() when not training.

+

Find the following code around line 90 in the init_network method.

+
        # Foregoing any fusions as to not disturb the existing ingestion pipeline
+        if self.is_training() and self.args.mode.quantization_aware:
+            self._raw_net.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')
+            self._net = torch.quantization.prepare_qat(self._raw_net)
+        else:
+            self._net = self._raw_net
+
+

After the above code, add:

+
        if self.args.run.compute_mode == ComputeMode.IPU:
+            if self.is_training():
+                opts = poptorch.Options()
+                self._net = poptorch.trainingModel(self._net, opts, optimizer=torch.optim.SGD(self._net.parameters(), lr=1e-3))
+            else:
+                self._net = poptorch.inferenceModel(self._net)
+
+

See poptorch.trainingModel() and poptorch.inferenceModel() for more information.

+

There is also a Build the Model tutorial.

+

Update Optimizer

+

Update init_optimizer() to use the poptorch class instead of the torch class as needed.

+

Change:

+
        if self.args.mode.optimizer.name == OptimizerKind.rmsprop:
+            self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)
+        else:
+            self._opt = torch.optim.Adam(self._net.parameters(), 1.0)
+
+

to:

+
        if self.args.mode.optimizer.name == OptimizerKind.rmsprop:
+            if self.args.run.compute_mode == ComputeMode.IPU:
+                self._opt = poptorch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)
+            else:
+                self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)
+        else:
+            if self.args.run.compute_mode == ComputeMode.IPU:
+                self._opt = poptorch.optim.Adam(self._net.parameters(), 1.0)
+            else:
+                self._opt = torch.optim.Adam(self._net.parameters(), 1.0)
+
+

Update the Forward Pass

+

Putting the loss calculation in forward_pass() allows the loss computation to be performed on the IPUs. +This will be faster because the data will not need to be transfered round-trip to the CPU.

+

Change forward_pass():

+

Original

+
            if net is None:
+                logits_image = self._net(minibatch_data['image'])
+            else:
+                logits_image = net(minibatch_data['image'])
+
+

Updated

+

The following code changes are to account for the loss function, i.e., self.loss_calculator, and the +image labels, i.e., labels_image, to be passed to the model's forward_pass method. Additionally, the calculated +loss is returned from the forward_pass method.

+
            if net is None:
+                if self.args.run.compute_mode == ComputeMode.IPU:
+                    logits_image, labels_image, loss = self._net(minibatch_data['image'], self.loss_calculator, labels_image)
+                    return logits_image, labels_image, loss
+                else:
+                    logits_image = self._net(minibatch_data['image'])
+            else:
+                if self.args.run.compute_mode == ComputeMode.IPU and self.args.mode.name != ModeKind.inference:
+                    logits_image, labels_image, loss = net(minibatch_data['image'], self.loss_calculator, labels_image)
+                    return logits_image, labels_image, loss
+                else:
+                    logits_image = net(minibatch_data['image'])
+
+

Update the Training Step

+

Receive the extra loss variable from the forward_pass method.

+

Update the train_step method.

+

Original Training Step

+
                    with self.timing_context("forward"):
+                        if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
+                            with torch.cuda.amp.autocast():
+                                logits_image, labels_image = self.forward_pass(minibatch_data)
+                        else:
+                            logits_image, labels_image = self.forward_pass(minibatch_data)
+
+                    verbose = False
+
+                    # Compute the loss based on the logits
+                    with self.timing_context("loss"):
+                        loss = self.loss_calculator(labels_image, logits_image)
+
+

Updated Training Step

+

The forward_pass() method was changed to return the extra variable loss in the previous section. It is now +received conditionally when using an IPU(s).

+

In the with self.timing_context("loss"): section, only calculate loss if not using an IPU(s).

+
                    with self.timing_context("forward"):
+                        if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
+                            with torch.cuda.amp.autocast():
+                                logits_image, labels_image = self.forward_pass(minibatch_data)
+                        else:
+                            if self.args.run.compute_mode == ComputeMode.IPU:
+                                logits_image, labels_image, loss = self.forward_pass(minibatch_data)
+                            else:
+                                logits_image, labels_image = self.forward_pass(minibatch_data)
+
+                    verbose = False
+
+
+                    # Compute the loss based on the logits
+                    with self.timing_context("loss"):
+                        if self.args.run.compute_mode == ComputeMode.IPU:
+                            loss = loss
+                        else:
+                            loss = self.loss_calculator(labels_image, logits_image)
+
+

Update Validation Step

+

Update the val_step method.

+

Original Validation Step Code

+

Find this code.

+
            if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
+                with torch.cuda.amp.autocast():
+                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
+            else:
+                logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
+
+            # Compute the loss based on the logits
+            loss = self.loss_calculator(labels_image, logits_image)
+
+

Updated Validation Step Code

+

Change the code to the following.

+
            if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:
+                with torch.cuda.amp.autocast():
+                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
+
+                    # Compute the loss based on the logits
+                    loss = self.loss_calculator(labels_image, logits_image)
+            else:
+                if self.args.run.compute_mode == ComputeMode.IPU:
+                    logits_image, labels_image, loss = self.forward_pass(minibatch_data, net=val_net)
+                else:
+                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)
+
+                    # Compute the loss based on the logits
+                    loss = self.loss_calculator(labels_image, logits_image)
+
+

UResNet2D Model

+

Update Model

+

The Graphcore system is more computationally efficient if the loss function is on the +IPU. This is accomplished by using the loss function within the model's forward method.

+

Edit src/networks/torch/uresnet2D.py.

+

Update the Forward Declaration

+

Find the forward method.

+
def forward(self, input_tensor):
+
+

Update the argument list to include the loss function, i.e., loss_calculator +and the image labels, i.e., labels_image.

+
def forward(self, input_tensor, loss_calculator=None, labels_image=None):
+
+

Add Loss Calculation

+

Add the loss calculation just before the forward method returns.

+
        if loss_calculator is not None:
+
+            labels_image = labels_image.long()
+            labels_image = torch.chunk(labels_image, chunks=3, dim=1)
+            shape =  labels_image[0].shape
+            labels_image = [ _label.view([shape[0], shape[-2], shape[-1]]) for _label in labels_image ]
+
+            loss = loss_calculator(labels_image, x)
+            import poptorch
+            loss = poptorch.identity_loss(loss , reduction="mean")
+            return x, labels_image, loss
+
+        # This return already exists.
+        return x
+
+

The poptorch.identity_loss method takes a single PyTorch tensor and will backpropagate a gradient of ones through it. You may find an example at here

+

bin/exec.py

+

The following is included for completeness. One will not likely find this in other code.

+

Open bin/exec.py in your favorite editor. Change:

+
@hydra.main(version_base=None, config_path="../src/config", config_name="config")
+
+

to

+
@hydra.main(config_path="../src/config", config_name="config")
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/unused/files/Gpt1.5B.sh b/ai-testbed/sambanova/unused/files/Gpt1.5B.sh new file mode 100644 index 0000000000..773b63f5e2 --- /dev/null +++ b/ai-testbed/sambanova/unused/files/Gpt1.5B.sh @@ -0,0 +1,51 @@ +#! /bin/bash +set -e +#export SF_RNT_LOG_LEVEL=DEBUG +ACTIVATE=/opt/sambaflow/apps/nlp/transformers_on_rdu/venv/bin/activate +LOGDIR=`date +%m%d%y.%H` +if [ "$1" ] ; then +LOGDIR=$1 +fi +MODEL_NAME="GPT1.5B" +OUTPUT_PATH=/data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out +echo "Using ${OUTPUT_PATH} for output" +mkdir -p /data/ANL/results/$(hostname)/${USER}/${LOGDIR} +export SN_NUM_THREADS=32 + +####################### +# Edit these variables. +####################### +export OMP_NUM_THREADS=18 +####################### +# Start script timer +SECONDS=0 +# Temp file location +DIRECTORY=$$ +OUTDIR=/data/scratch/${USER}/GPT_RUN +mkdir -p ${OUTDIR} +source ${ACTIVATE} +echo "Model: " ${MODEL_NAME} > ${OUTPUT_PATH} 2>&1 +echo "Date: " $(date +%m/%d/%y) >> ${OUTPUT_PATH} 2>&1 +echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1 +cd ${OUTDIR} +####################### +echo "Machine State Before: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +if [ ! -e ${OUTDIR}/gpt15/gpt15.pef ] ; then + ####################### + echo "COMPILE START AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 + + # 1.14.3-8 COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_anl.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nogroups.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --data-parallel -ws 2 --weight_decay 0.1 --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15 --output-folder=${OUTDIR}" + COMMAND="python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_lnsd.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nonpardp_norc_e2e.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --data-parallel -ws 2 --weight_decay 0.1 --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15 --output-folder=${OUTDIR}" + + echo "COMPILE COMMAND: $COMMAND" >> ${OUTPUT_PATH} 2>&1 + eval $COMMAND >> ${OUTPUT_PATH} 2>&1 + echo "COMPILE END AT ${SECONDS}" >> ${OUTPUT_PATH} 2>&1 +fi +####################### +echo "RUN" >> ${OUTPUT_PATH} 2>&1 +/usr/local/bin/sbatch --output=${HOME}/slurm-%A.out --ntasks 32 --gres=rdu:1 --ntasks-per-node 16 --nodes 2 --cpus-per-task=8 /data/ANL/scripts/Gpt1.5B_run.sh ${1} >> ${OUTPUT_PATH} 2>&1 + +echo "Machine state After: " >> ${OUTPUT_PATH} 2>&1 +/opt/sambaflow/bin/snfadm -l inventory >> ${OUTPUT_PATH} 2>&1 +echo "Duration: " $SECONDS >> ${OUTPUT_PATH} 2>&1 diff --git a/ai-testbed/sambanova/unused/performance-tools/index.html b/ai-testbed/sambanova/unused/performance-tools/index.html new file mode 100644 index 0000000000..eae80161c6 --- /dev/null +++ b/ai-testbed/sambanova/unused/performance-tools/index.html @@ -0,0 +1,6724 @@ + + + + + + + + + + + + + + + + + + + + + Performance Tools - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Performance Tools

+

Tile Status

+
sntilestat
+watch sntilestat
+
+

Measure TFLOPs

+

This is an example for measuring TFLOPs for Conv2D forward pass.

+
elif args.command == 'run':
+    samba.session.run(inputs, section_types=['fwd'])
+    #samba.session.run(inputs, section_types=['bckwd'])
+    n_iters = 100
+    forward_pass_time = []
+    print("run starts")
+    start_time_forward = time.time()
+    for loop in range(n_iters):
+        samba.session.run(inputs, section_types=['fwd'])
+        #samba.session.run(inputs, section_types=['bckwd'])
+        #samba.session.run(inputs, section_types=['fwd', 'bckwd'])
+    end_time_forward = time.time()
+    forward_pass_time.append(end_time_forward - start_time_forward)
+    print("run ends")
+
+    w_0 = (args.w + 2*args.pad_w - args.s)/args.wstride + 1
+    h_0 = (args.h + 2*args.pad_h - args.r)/args.hstride + 1
+    tflops = 2 * (w_0*h_0) * args.s * args.r * args.c * args.k * args.n
+    tflops_forw = tflops/(sum(forward_pass_time)/n_iters/5)/(10**12) #tflops
+    print(tflops)
+    print(sum(forward_pass_time))
+    print("tflops: %f"%tflops_forw)
+    print("SN,Training,%s,Conv2d_fwd,%d,100,1,%d,%d,%d,%d,%d,%d,%d,0.0,%f,None,%f,%f,%f" % ("dtype", args.n, args.w, args.h, args.c, args.k, args.s, args.pad_w, args.wstride, (sum(forward_pass_time)/n_iters)/args.n, args.n/(sum(forward_pass_time)/n_iters), tflops_forw, (sum(forward_pass_time)/n_iters)/args.n))
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/unused/running-GPT2-multi-node/index.html b/ai-testbed/sambanova/unused/running-GPT2-multi-node/index.html new file mode 100644 index 0000000000..0785740bff --- /dev/null +++ b/ai-testbed/sambanova/unused/running-GPT2-multi-node/index.html @@ -0,0 +1,6755 @@ + + + + + + + + + + + + + + + + + + + + + Running GPT-2 on Multiple Nodes - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Running GPT-2 on Multiple Nodes

+ + +

This GPT-2 example is for 1.5B parameters on two (2) nodes. +Each node has eight (8) RDUs for a total of sixteen (16) RDUs.

+

Create a Directory

+
cd <path to desired directory>
+mkdir GPT1.5B
+cd GPT1.5B
+
+

Establish Script

+

Using your favorite editor, create the file 'Gpt1.5B.sh'.

+

Copy the contents of Gpt1.5B.sh.

+

Make the script executable:

+
chmod +x Gpt1.5B.sh
+
+

Multiple Nodes

+

Gpt1.5B.sh contains the sbatch command:

+
/usr/local/bin/sbatch --output=${HOME}/slurm-%A.out --ntasks 32 --gres=rdu:1 --ntasks-per-node 16  --nodes 2 --cpus-per-task=8  /data/ANL/scripts/Gpt1.5B_run.sh ${1} >> ${OUTPUT_PATH} 2>&1
+
+

The sbatch nodes argument specifies the number of nodes to use.

+

nodes 2 Nodes to use.

+

Additionally, here are the other sbatch arguments.

+

--ntasks 32: This option specifies the number of tasks to be used in the job.

+

ntasks-per-node 16: This option specifies the number of tasks per node.

+

gres=rdu:1 Indicates the model fits on a single RDU.

+

cpus-per-task=8 CPUs per task.

+

Run

+

The script accepts an optional first parameter to specify the log directory.

+

Run the script:

+
./Gpt1.5B.sh <optional log directory>
+
+

Output

+

The output can be found at /data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out. +The actual path will be displayed on the screen.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/unused/running-GPT2/index.html b/ai-testbed/sambanova/unused/running-GPT2/index.html new file mode 100644 index 0000000000..946376dd55 --- /dev/null +++ b/ai-testbed/sambanova/unused/running-GPT2/index.html @@ -0,0 +1,6672 @@ + + + + + + + + + + + + + + + + + + + + + Running GPT2 - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Running GPT2

+

The Pile and OWT data are located in:

+
/data/ANL/pile
+/data/ANL/openwebtext_ss2048
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/unused/running-bert-large-on-sn30/index.html b/ai-testbed/sambanova/unused/running-bert-large-on-sn30/index.html new file mode 100644 index 0000000000..f3f08723d3 --- /dev/null +++ b/ai-testbed/sambanova/unused/running-bert-large-on-sn30/index.html @@ -0,0 +1,6783 @@ + + + + + + + + + + + + + + + + + + + + + Running BERT-Large on SambaNova DataScale SN30-8 - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Running BERT-Large on SambaNova DataScale SN30-8

+ + +

Set Up

+

Establish a test directory from which to work.

+
mkdir $HOME/app-test
+cd $HOME/app-test
+
+

Copy BertLarge.sh into your current directory.

+
cp /data/ANL/scripts/BertLarge.sh .
+
+

Running Bert Large Options

+

Let's cover several options for executing the script.

+
    +
  1. Basic
  2. +
+
sbatch --output=${HOME}/app-test/slurm-%A.out --cpus-per-task=128 --gres=rdu:16 BertLarge.sh
+
+
    +
  1. Specify a Log File
  2. +
+

This is helpful if doing multiple runs and one wishes to specify a run ID. + This bash script argument is optional. Place it at the very end of the command.

+

Example:

+
sbatch --output=${HOME}/app-test/slurm-%A.out --cpus-per-task=128 --gres=rdu:16 BertLarge.sh my_runID
+
+
    +
  1. Specify Nodelist
  2. +
+

One may optionally specify a nodelist for sbatch. An example is to use hostname.

+
sbatch --nodelist $(hostname) --output=${HOME}/app-test/slurm-%A.out --cpus-per-task=128 --gres=rdu:16 BertLarge.sh
+
+

Running Bert Large

+

Let's specify the log file and the nodelist.

+

Run

+
sbatch --nodelist $(hostname) --output=${HOME}/app-test/slurm-%A.out --cpus-per-task=128 --gres=rdu:16 BertLarge.sh
+
+

Output

+

Display the slurm output. For example:

+
cat slurm-9637.out
+
+

The output will look something like:

+
Using /data/ANL/results/sn30-r3-h1/userid/040423.19/BertLarge.out for output
+
+

You may display that file. You may want to use less to do so because it is quite long.

+
less /data/ANL/results/sn30-r3-h1/userid/040423.19/BertLarge.out
+
+

The organization of the file is:

+
    +
  1. System Status
  2. +
  3. Compile (very long)
  4. +
  5. Run
  6. +
  7. System Status
  8. +
  9. Run Duration
  10. +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/unused/sambatune-user-guide/index.html b/ai-testbed/sambanova/unused/sambatune-user-guide/index.html new file mode 100644 index 0000000000..6720c507be --- /dev/null +++ b/ai-testbed/sambanova/unused/sambatune-user-guide/index.html @@ -0,0 +1,7246 @@ + + + + + + + + + + + + + + + + + + + + + SambaTune - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

SambaTune

+

Notes

+

Rick 4/16/2023 [10:16 AM] +/home/rweisner/sambatune_ui_dir contains the 1.15.3 version which is the latest released version. It should work on your experimental. You will need browser access to wherever you install it.

+
cd /home/rweisner/tmp/uno_test
+
+
#TODOBRW
+ssh wilsonb@homes.cels.anl.gov
+ssh sm-02
+MobilePass+ password
+On sm-02
+source /opt/sambaflow/venv/bin/activate
+export PATH=/opt/sambaflow/bin:$PATH
+sambatune linear_net.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run
+sambatune_ui --directory /home/wilsonb/tmp/sambatune_gen --port 8580
+#There will be a username and password displayed that you will use in your browser on your laptop.
+Command used on laptop for port forward
+ssh -XL 8580:127.0.0.1:8580 wilsonb@sm-02.cels.anl.gov
+MobilePass+ password
+# You will be logged into sm-02 but, you do not need to do anything.
+address used in browser on laptop localhost:8580
+#Use username and password from sambatune_ui.
+Username
+Password
+
+#TODOBRW
+/home/wilsonb/DL/Sambanova/apps_1.12/private/anl/2022-09-21T19-21-05.html
+
+

About SambaTune

+

SambaTune is a tool for profiling, debugging, and tuning the performance of applications +running on SN hardware.

+

The tool automates the collection of hardware performance counters, metrics aggregation, +report generation, and visualization. It also automates benchmarking of the application +to compute average throughput over a sufficient number of runs. The tool is designed to +aid the user with performance bottleneck analysis and tuning.

+

SambaTune is currently used by SN engineers involved in performance tuning efforts. +SambaTune is also planned for release to external customers to aid with performance +bottleneck analysis and resolution.

+

Run SambaTune

+
ssh ALCFUserID@sambanova.alcf.anl.gov
+# Enter MobilePass+ pass code
+ssh sm-01
+
+
#TODOBRW
+ssh wilsonb@sambanova.alcf.anl.gov
+# Enter MobilePass+ pass code
+ssh sm-01
+
+

First, enter the virtual environment on sm-01 or sm-02:

+
source /opt/sambaflow/venv/bin/activate
+
+

Update path:

+
export PATH=/opt/sambaflow/bin:$PATH
+
+

Usage

+
usage: sambatune [-h] [--artifact-root ARTIFACT_ROOT] [--disable-override]
+                 [--compile-only | -m MODES [MODES ...]] [--version]
+                 config
+
+positional arguments:
+  config                YAML file with model, compile, run configuration.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --artifact-root ARTIFACT_ROOT
+                        Custom location to save compile/run artifacts;
+                        defaults to '$DUMP_ROOT/artifact_root' (default: None)
+  --disable-override    Reuse the placement from the baseline compilation
+                        (default: False)
+  --compile-only        Run compilation of PEFs for selected modes only
+                        (default: False)
+  -m MODES [MODES ...], --modes MODES [MODES ...]
+                        Select modes to execute from ['benchmark',
+                        'instrument', 'run'] (default: ['benchmark'])
+  --version             version of sambatune and sambaflow.
+
+

Command Overview

+

By default, it will run with the benchmarking mode enabled. Use the --modes flag to run +modes individually or in any combination. +Benchmark-Only:

+
sambatune example_net.yaml --artifact-root $(pwd)/artifact_root --modes benchmark
+
+

Instrument-Only:

+
sambatune example_net.yaml --artifact-root $(pwd)/artifact_root --modes instrument
+
+

All modes:

+
sambatune example_net.yaml --artifact-root $(pwd)/artifact_root --modes instrument
+
+

Command Example

+
# From Bill
+python /opt/sambaflow/apps/private/anl/uno_full.py compile --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --pef-name=uno_16_4_500_ws --output-folder=/home/arnoldw//models_dir/1520847 --mac-v1
+
+python /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial --pef=/home/arnoldw//models_dir/1520847/uno_16_4_500_ws/uno_16_4_500_ws.pef --in_dir /var/tmp/raw/ --mac-v1
+
+
# From Bill --> Bruce
+python /opt/sambaflow/apps/private/anl/uno_full.py compile --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --pef-name=uno_16_4_500_ws --output-folder='.' --mac-v1
+
+export OMP_NUM_THREADS=1
+python /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial --pef=./uno_16_4_500_ws/uno_16_4_500_ws.pef --in_dir /var/tmp/raw/ --mac-v1
+
+
#TODOBRW  This works.  9/19/22
+sm-01/home/wilsonb/tmp/uno_test/uno_ccle.yaml
+app: /opt/sambaflow/apps/private/anl/uno_full.py
+
+model-args: --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial
+
+compile-args: compile --plot --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --mac-v1
+
+run-args: --multiprocess-pickle --use-pickle-train  --measure-spatial --train-samba-spatial --mac-v1 --train_source CCLE --lr 0.001 --data-dir /software/sambanova/dataset/CCLE_16_500 --converted-pickle
+
+env:
+     OMP_NUM_THREADS: 16,
+     SF_RNT_NUMA_BIND: 2
+
+

Run the following example:

+
sambatune uno_ccle.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run
+
+
#TODOBRW
+# Stand-alone
+export UNO=.
+export NS=500
+srun python /opt/sambaflow/apps/private/anl/uno_full.py compile --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --pef-name=uno_16_4_${NS}_ws --output-folder='.' --mac-v1
+
+export OMP_NUM_THREADS=1
+srun python /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./uno_16_4_${NS}_ws/uno_16_4_${NS}_ws.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --data-dir /software/sambanova/dataset/CCLE_16_${NS}
+
+export UNO=.
+export NS=500
+export OMP_NUM_THREADS=1
+srun pyinstrument /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./uno_16_4_${NS}_ws/uno_16_4_${NS}_ws.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --data-dir /software/sambanova/dataset/CCLE_16_${NS} > pyinstrument_1.13.log 2>&1
+
+
+
+Ricks run python ${UNO}/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=“out/uno_16_4_${NS}/uno_16_4_${NS}.pef” --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE
+
+
#TODOBRW
+sm-01/home/wilsonb/DL/Sambanova/apps_1.12/private/anl/uno_brw_CCLE_1_12.yaml
+export OMP_NUM_THREADS=16
+app: /home/wilsonb/DL/Sambanova/apps_1.12/private/anl/uno_full.py
+
+model-args: --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial
+
+compile-args: compile --plot --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --mac-v1
+
+run-args: --measure-spatial --train-samba-spatial --mac-v1 --train_source CCLE --lr 0.001 --data-dir /software/sambanova/dataset/CCLE_16_500
+
+env:
+     OMP_NUM_THREADS: 16,
+     SF_RNT_NUMA_BIND: 2
+
+

Run the following example:

+
sambatune uno_brw_CCLE_1_12.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run
+
+export UNO=.
+export NS=50
+export OMP_NUM_THREADS=1
+
+srun python /opt/sambaflow/apps/private/anl/uno_full.py compile --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --mac-v1
+
+xsrun pyinstrument /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./uno_16_4_${NS}_ws/uno_16_4_${NS}_ws.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --data-dir /software/sambanova/dataset/CCLE_16_${NS} --epochs 1 > my.log 2>&1
+
+srun python /opt/sambaflow/apps/private/anl/uno_full.py run --multiprocess-pickle  --measure-spatial --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./out/uno_full_16_47_${NS}/uno_full_16_47_${NS}.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --lr 0.001 --data-dir /software/sambanova/dataset/CCLE_16_${NS} > pyinstrument_1.13.log 2>&1
+
+cat my.log # Has pyinstrument run name.
+pyinstrument --load-prev 2022-09-21T19-21-05 -r html
+
+
+1.13
+
+source /opt/sambaflow/venv/bin/activate
+cd ~/tmp/uno_test/
+export UNO=.
+export NS=500
+export OMP_NUM_THREADS=1
+export PATH=/opt/sambaflow/bin:$PATH
+sntilestat
+
+
+
+./uno_pickl.sh compile 500
+./uno_pickl.sh run 500
+
+
sambatune uno_brw_CCLE_1_12.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run
+
+export UNO=.
+export NS=50
+export OMP_NUM_THREADS=1
+
+srun python /opt/sambaflow/apps/private/anl/uno_full.py compile --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --mac-v1
+
+xsrun pyinstrument /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./uno_16_4_${NS}_ws/uno_16_4_${NS}_ws.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --data-dir /software/sambanova/dataset/CCLE_16_${NS} --epochs 1 > my.log 2>&1
+
+srun python /opt/sambaflow/apps/private/anl/uno_full.py run --multiprocess-pickle  --measure-spatial --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./out/uno_full_16_47_${NS}/uno_full_16_47_${NS}.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --lr 0.001 --data-dir /software/sambanova/dataset/CCLE_16_${NS} > pyinstrument_1.13.log 2>&1
+
+cat my.log # Has pyinstrument run name.
+pyinstrument --load-prev 2022-09-21T19-21-05 -r html
+
+
+1.13
+
+source /opt/sambaflow/venv/bin/activate
+cd ~/tmp/uno_test/
+export UNO=.
+export NS=500
+export OMP_NUM_THREADS=1
+export PATH=/opt/sambaflow/bin:$PATH
+sntilestat
+
+

uno_pickl.sh

+
#! /bin/bash -x
+#set -e
+source /opt/sambaflow/venv/bin/activate
+SECONDS=0
+NS=${2}
+UNO=/opt/sambaflow/apps/private/anl/
+DS="ALL"
+DS="CCLE"
+
+BS=$((NS*16))
+export OMP_NUM_THREADS=16
+
+echo "Model: UNO_SPA_TRN"
+echo "Date: " $(date +%m/%d/%y)
+echo "Time: " $(date +%H:%M)
+if [ "${1}" == "convert" ] ; then
+python3 ${UNO}/uno/uno_data_loaders_converted.py   --in_dir /var/tmp/raw/ --out_dir /software/sambanova/dataset/${DS}_16_${NS}  --batch-size ${BS} --train_sources ${DS} --file-write-frequency 10
+
+
+elif [ "${1}" == "compile" ] ; then
+  echo "COMPILE"
+  python ${UNO}/uno_full.py compile --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --mac-human-decision ${UNO}/samba_uno/human_decisions_spatial.json --pef-name="uno_16_4_${NS}" --mac-v1
+
+
+elif [ "${1}" == "run" ] ; then
+  echo "RUN ${DS}"
+  SF_RNT_NUMA_BIND=2
+  #python ${UNO}/uno_full.py run --acc-test --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef="out/uno_16_4_${NS}/uno_16_4_${NS}.pef" --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE
+  python ${UNO}/uno_full.py run --mac-v1 --multiprocess-pickle --use-pickle-train --train-samba-spatial -b 16 -mb 4 --num-spatial-batches ${NS} --lr 0.001 --mapping spatial --data-dir /software/sambanova/dataset/${DS}_16_${NS} --converted-pickle --train_sources ${DS} --pef="out/uno_16_4_${NS}/uno_16_4_${NS}.pef" --epochs 1
+  #python ${UNO}/uno_full.py run --mac-v1 --multiprocess-pickle --use-pickle-train --train-samba-spatial -b 16 -mb 4 --num-spatial-batches ${NS} --lr 0.001 --mapping spatial --data-dir /software/sambanova/dataset/${DS}_16_${NS} --converted-pickle --train_sources ${DS} --pef="out/uno_16_4_${NS}/uno_16_4_${NS}.pef"
+
+elif [ "${1}" == "pyinstrument" ] ; then
+  echo "RUN ${DS}"
+  SF_RNT_NUMA_BIND=2
+  #python ${UNO}/uno_full.py run --acc-test --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef="out/uno_16_4_${NS}/uno_16_4_${NS}.pef" --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE
+  pyinstrument ${UNO}/uno_full.py run --mac-v1 --multiprocess-pickle --use-pickle-train --train-samba-spatial -b 16 -mb 4 --num-spatial-batches ${NS} --lr 0.001 --mapping spatial --data-dir /software/sambanova/dataset/${DS}_16_${NS} --converted-pickle --train_sources ${DS} --pef="out/uno_16_4_${NS}/uno_16_4_${NS}.pef" --epochs 1
+  #python ${UNO}/uno_full.py run --mac-v1 --multiprocess-pickle --use-pickle-train --train-samba-spatial -b 16 -mb 4 --num-spatial-batches ${NS} --lr 0.001 --mapping spatial --data-dir /software/sambanova/dataset/${DS}_16_${NS} --converted-pickle --train_sources ${DS} --pef="out/uno_16_4_${NS}/uno_16_4_${NS}.pef"
+
+elif [ "${1}" == "no_pickle" ] ; then
+  echo "no_pickle ${DS}"
+  SF_RNT_NUMA_BIND=2
+  python ${UNO}/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef="out/uno_16_4_${NS}/uno_16_4_${NS}.pef" --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE
+
+elif [ "${1}" == "mp" ] ; then
+echo "Duration: " $SECONDS
+
+elif [ "${1}" == "mp" ] ; then
+echo "Duration: " $SECONDS
+echo "PERF"
+python uno_full.py measure-performance --measure-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef="out/uno_16_4_${NS}/uno_16_4_${NS}.pef" --num-iterations 20 --mac-v1
+fi
+
+echo "Duration: " $SECONDS
+
+
./uno_pickl.sh compile 500
+./uno_pickl.sh run 500
+./uno_pickl.sh pyinstrument 500
+pyinstrument --load-prev 2022-09-22T18-31-24 -r html
+stdout is a terminal, so saved profile output to /tmp/tmpeo5ehksn.html
+cp /tmp/tmpeo5ehksn.html .
+
+

On dev terminal

+
scp wilsonb@sambanova.alcf.anl.gov:tmp/uno_test/tmpeo5ehksn.html .
+
+

View in local browser.

+

Running

+

Create a directory for your work.

+
mkdir ~/sambatune
+cd ~/sambatune
+
+

Create small_vae.yaml with the following content using your favorite editor.

+
app: /opt/sambaflow/apps/private/anl/moleculevae.py
+
+model-args: -b 128 --in-width 512 --in-height 512
+
+compile-args: compile --plot --enable-conv-tiling --compiler-configs-file /opt/sambaflow/apps/private/anl/moleculevae/compiler_configs_conv.json --mac-v2 --mac-human-decision /opt/sambaflow/apps/private/anl/moleculevae/symmetric_human_decisions_tiled_v2.json
+
+run-args: --input-path /var/tmp/dataset/moleculevae/ras1_prot-pops.h5 --out-path ${HOME}/moleculevae_out --model-id 0 --epochs 10
+
+env:
+     OMP_NUM_THREADS: 16
+     SF_RNT_FSM_POLL_BUSY_WAIT: 1
+     SF_RNT_DMA_POLL_BUSY_WAIT: 1
+     CONVFUNC_DEBUG_RUN: 0
+
+

Run the following example:

+
sambatune small_vae.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run
+
+

Create linear_net.yaml with the following content using your favorite editor.

+
app: /opt/sambaflow/apps/micros/linear_net.py
+
+model-args: >
+  -b 1024
+  -mb 64
+  --in-features 8192
+  --out-features 4096
+  --repeat 128
+  --inference
+
+compile-args: >
+  --n-chips 2
+  --plot
+
+env:
+  SF_RNT_FSM_POLL_BUSY_WAIT: 1
+  SF_RNT_DMA_POLL_BUSY_WAIT: 1
+  CONVFUNC_DEBUG_RUN": 0
+
+

NOTE: The following takes 45 minutes to run.

+

Run the following example:

+
sambatune linear_net.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run
+
+
#TODOBRW
+cd ~/tmp/uno_test
+screen
+sambatune uno.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run
+
+

where linear_net.yaml is a user-specified configuration file you created above.

+

SambaTune UI

+

Port Availability

+

It is recommended that you check if the port you want to use is available. You may check by:

+
ps -elf | grep desired_port
+
+

Example:

+
ps -elf | grep 8576
+
+

Alternatively, you may check for all ports in use by sambatune_ui:

+
ps -elf | grep sambatune_ui
+
+

If you need to free a port that you are finished with, you may use the kill command.

+

Start SambaTune UI

+

If you followed the above directions, your artifact_root will be at ~/sambatune/artifact_root.

+

Start the UI:

+

It will tell you the username and password.

+

NOTE: It is recommended to use a port other than 8576 in case someone else is using it. Select another port close to 8576.

+

Next

+
sambatune_ui --directory ~/sambatune/artifact_root/sambatune_gen/ --port 8576
+
+
#TODOBRW
+sambatune_ui --directory ~/sambatune/artifact_root/sambatune_gen/ --port 8580
+sambatune_ui --directory /home/wilsonb/tmp/uno_test/artifact_root/sambatune_gen --port 8580
+username: "admin", password: "4f7cac2c-351e-11ed-93a3-f7ef9c6e5d46"
+username: "admin", password: "aaf1fc88-35c8-11ed-93a3-f7ef9c6e5d46"
+username: "admin", password: "bf64e4f8-3831-11ed-93a3-f7ef9c6e5d46"
+username: "admin", password: "8feca89e-384c-11ed-93a3-f7ef9c6e5d46"
+username: "admin", password: "355222d6-3a88-11ed-93a3-f7ef9c6e5d46"
+
+

You will see something like:

+
with the,
+    username: "admin", password: "05c63938-2941-11ed-93a3-f7ef9c6e5d46"
+[2022-08-31 15:24:36 +0000] [1344959] [Info] Starting gunicorn 20.1.0
+[2022-08-31 15:24:36 +0000] [1344959] [Info] Listening at: http://0.0.0.0:8576 (1344959)
+[2022-08-31 15:24:36 +0000] [1344959] [Info] Using worker: sync
+[2022-08-31 15:24:36 +0000] [1345092] [Info] Booting worker with pid: 1345092
+[2022-08-31 15:24:36 +0000] [1345093] [Info] Booting worker with pid: 1345093
+
+

NOTE: Write down the username and password.

+

NOTE: The password only works with this one instance of sambatune_ui. If you stop this instance of sambatune_ui and start another instance, it will have a new password.

+

NOTE: You will need to > or use the kill command to stop sambatune_ui when you have finished. +Not doing so will tie up the port. +You can ps -elf | grep the_port_you_used to find the running processes. +If you are not comfortable doing this, please ask for help.

+

Use Port-Forwarding

+

This describes the steps to set up port-forwarding for applications, +like SambaTune UI, which runs on the SambaNova system and binds to one or more ports. +This example uses 8576 and 18576 as port numbers. Using port numbers other than these may +avoid collisions with other users.

+

From your local machine

+

This command sets up a port forward SambaNova login node to your local machine.

+

Run

+
ssh -N -f -L localhost:18576:localhost:18576 ALCFUserID@sambanova.alcf.anl.gov
+...
+Password: < MobilePass+ code >
+
+ssh ALCFUserID@sambanova.alcf.anl.gov
+
+
#TODOBRW
+ssh -v -N -f -L localhost:8580:localhost:8580 wilsonb@sambanova.alcf.anl.gov
+ssh -N -f -L localhost:8580:localhost:8580 wilsonb@sambanova.alcf.anl.gov
+...
+Password: < MobilePass+ code >
+
+ssh wilsonb@sambanova.alcf.anl.gov
+
+

replacing ALCFUserID with your ALCF User ID.

+

From sambanova.alcf.anl.gov

+

This command sets up a port forward from a SambaNova node to the sambanova login machine.

+

Below are the commands specific to sm-01. You may replace sm-01 with sm-02 when using that system.

+

Run

+

NOTE: The full name is sm-01.ai.alcf.anl.gov and it may also be used.

+
ssh -N -f -L localhost:18576:localhost:8576 ALCFUserID@sm-01
+
+
#TODOBRW
+ssh -N -f -L localhost:8580:localhost:8580 wilsonb@sm-01
+
+

Browser on Local Machine

+

Then, navigate in your browser to, in this example, http://localhost:18576 on your local machine.

+

Use the username and password from sm-01 to log in.

+

SSH Notes

+

Explanation of ssh command:

+
-N : no remote commands
+
+-f : put ssh in the background
+
+-L <machine1>:<portA>:<machine2>:<portB> :
+
+The full command line will forward <machine1>:<portA> (local scope) to <machine2>:<portB> (remote scope)
+
+

Adapted from: How can I run Tensorboard on a remote server?

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/sambanova/virtual-environment/index.html b/ai-testbed/sambanova/virtual-environment/index.html new file mode 100644 index 0000000000..1d4e768f8d --- /dev/null +++ b/ai-testbed/sambanova/virtual-environment/index.html @@ -0,0 +1,6796 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Virtual Environment - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Virtual Environments

+

Using a Venv

+

To create a virtual environment, one can use the --system-site-packages flag:

+
python -m venv --system-site-packages my_env
+source my_env/bin/activate
+
+

Installing Packages

+

Install packages in the normal manner such as:

+
python3 -m pip install <package>
+
+

For more details see Use pip for installing.

+

To install a different version of a package that is already installed in one's environment, one can use:

+
pip install --ignore-installed  ... # or -I
+
+

Pre-Built Sample Venv

+

Each of the samples or application examples provided by SambaNova has its own pre-built virtual environment which can be readily used. They are present in the /opt/sambaflow/apps/ directory tree within each of the applications.

+
+

Note: Conda is not supported on the SambaNova system.

+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + + + +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/ai-testbed/stylesheets/alcf-extra.css b/ai-testbed/stylesheets/alcf-extra.css new file mode 100644 index 0000000000..95c620a9b1 --- /dev/null +++ b/ai-testbed/stylesheets/alcf-extra.css @@ -0,0 +1,903 @@ +[data-md-color-scheme="alcf"] { + /* Colors */ + --md-primary-fg-color: #0061af; + --md-primary-fg-color--light: #FFFFFF; + --md-primary-fg-color--dark: #080813; + --md-accent-fg-color: #118ACB; + --md-accent-fg-color--transparent: hsla(#{hex2hsl($md-accent-fg-color)}, 0.1); + --md-accent-bg-color: hsla(0, 0%, 100%, 1); + --md-accent-bg-color--light: hsla(0, 0%, 100%, 0.7); +} + + + + +/* typography */ +body { + font-family: proxima-nova, sans-serif; +} +.md-typeset h1, .md-typeset h2, .md-typeset h3 { + font-weight: 600; +} + +.md-typeset h1, .md-typeset h2 { + color: #1d1651; +} + +.md-typeset h1 { + font-size: 60px; + line-height: 60px; + border-bottom: 1px solid rgb(216, 220, 225); + padding-bottom: .75rem; + margin: 0; +} + +.md-typeset h2 { + font-size: 36px; + line-height: 36px; +} + +.md-typeset h3 { + margin:3em 0 0; +; +} + +.md-typeset p, .md-typeset li { + font-size: 19px; + line-height: 27px; + font-weight: 400; + margin-block-start: .8em; +} + +.md-typeset li{ + margin-block-start: 0; +} + + +.md-typeset ul { + list-style-type: none; +} + +.md-typeset ul > li { + text-indent: -15px; +} + +.md-typeset ul > li:before { + content: "–"; + margin-right: 5px; + font-weight: 600; + color: #6e6e78; +} + + + + +/* layout(ish) */ + +.md-content__inner { + margin-bottom: 5rem; +} + +.md-header[data-md-state=shadow] { + box-shadow: none; +} + +.md-header__button.md-logo img { + height: auto; + width: 5rem; +} + + + + +/* primary header */ +/* ------------------------------------------------------------------------- */ + +.header--primary { + background-color: var(--md-primary-fg-color); + color: #e7f6fd; + border-bottom: 1px solid #1d1651; + height: 152px; + font-size: 19px; + line-height: 24px; +} + +@media only screen and (max-width: 960px) { + .header--primary { display: none; } +} + +.header--primary .grid{ + position: relative; + display: grid; + grid-template-columns: repeat(6, 1fr); + grid-template-rows: auto; + width: calc(100% - 28px * 2); + margin: 0 auto; + grid-column-gap: 24px; + column-gap: 24px; /* repeated due to change in spec */ + grid-row-gap: 0; + row-gap: 0; /* repeated due to change in spec */ + max-width: 59.8rem; +} + +.header--primary .header__nav-primary a { + color: #FFF; +} + +.header--primary .header__nav-secondary a { + color: #badef5; +} + +.header--primary .header__nav-secondary a:hover { + color: #118acb; +} + +.header--primary a:hover { + color: #118acb; +} + + +.header--primary .grid { + min-height: 152px; +} + +.header--primary .header__site-title { + grid-column: span 6; + grid-row: 2; + margin-top: 18px; + margin-bottom: -8px; +} + +.header--primary .header__site-title h1 { + color: #b8e2de; + font-size: 36px; + font-weight: 600; + line-height: 36px; +} + +.header--primary .header__nav-primary { + grid-column: span 6; + grid-row: 3; + position: relative; + height: 16px; +} + +.header--primary .header__nav-primary ul { + position: relative; + bottom: 4px; + margin-block-start: 0; + padding-inline-start: 0; + margin-block-end: 0; + grid-column: span 6; +} + +.header--primary .header__nav-primary li:last-child { + padding-bottom: 8px; + border-left: 1px solid #1d1651; +} + +.header--primary .header__nav-primary a { + padding-right: 27px; + font-weight: 600; + display: inline-block; + height: 30px; +} + +.header--primary .header__nav-primary .dropdown a { + height: auto; +} + +.header--primary .header__nav-primary .dropdown li { + display: block; +} + +.md-nav { + font-size: 15px; + line-height: 1; +} + +.md-nav__link { + margin-top: 6px; +} + +#nav-sup { + padding-left: 32px; +} + +.header--primary .header__nav-secondary { + grid-column: 2 / span 5; + grid-row: 1; + margin-top: 12px; + position: relative; + min-width: 380px; +} + +.header--primary .header__nav-secondary--right { + position: absolute; + right: 0; +} + +.header--primary .header__nav-secondary ul { + /*padding-right: 16px;*/ + display: inline-block; + position: relative; + top: -24px; /* hacky, to line up with search box, better way? */ +} + +.header--primary .header__nav-secondary li { + font-size: 14px; + line-height: 16px; + font-weight: 800; + padding-right: 16px; + letter-spacing: 1.4px; + text-transform: uppercase; +} + +.header--primary .header__nav-secondary li:last-child { + margin-right: 0; +} + +.header--primary .header__nav-secondary li, +.header--primary .header__nav-primary li { + display: inline; +} + +.header--primary .header__nav-secondary .md-search{ + display: inline-block; + position: relative; + top: -6px; +} + + +@media screen and (min-width: 60em) { + .header--primary .header__nav-secondary .md-search__inner { + width: 8rem; + } +} + +@media screen and (min-width: 60em) and (max-width: 76.1875em){ + [data-md-toggle=search]:checked~.md-header .md-search__inner { + width: 18rem; + } +} + +@media screen and (min-width: 76.25em) { + [data-md-toggle=search]:checked~.md-header .md-search__inner { + width: 18rem; + } +} + +[data-md-toggle=search]:checked~.md-header .md-search__scrollwrap { + width: 100%; +} + +.md-search__scrollwrap h1, +.md-search__scrollwrap p { + text-transform: none; + letter-spacing: 0; +} + +.md-search-result__article--document .md-search-result__title { + font-weight: 600; + color: var(--md-typeset-a-color); +} + +.md-search__scrollwrap details, + .md-search__scrollwrap summary{ + font-size: 12px; + font-weight: 600; + color: var(--md-accent-fg-color); +} + +.md-search__scrollwrap p { + font-weight: 400; + color: rgba(0,0,0,0.87); + font-size: 15px; + line-height: 18px; +} + + + +.dropdown { + grid-column: 1 / span 5; + width: calc(83.3333% + 24px); + position: absolute; + grid-row: 4; + z-index: 50; + border-top: 1px solid #badef5; +} + + +.dropdown nav { + display: grid; + grid-template-columns: repeat(5, 1fr); + /*grid-template-rows: repeat(0, 1fr);*/ + grid-column-gap: 24px; + column-gap: 24px; + grid-row-gap: 0; + row-gap: 0; + grid-column: auto/span 5; + /*grid-row: span $rows;*/ + background-color: #080812; + color: #fff; + @include inset($top: m); + margin-left: -28px; + padding-left: 28px; + padding-right: 24px; +} + +.dropdown .dropdown__intro { + grid-column: span 2; +} + +.dropdown .dropdown__intro-subhead h2 { + font-size: 24px; + line-height: 28px; + font-weight: 600; + margin-bottom: 0; +} + +.dropdown .dropdown__intro-description p { + font-size: 19px; + line-height: 24px; + font-weight: 600; +} + +.dropdown .dropdown__links-1, +.dropdown .dropdown__links-2, +.dropdown .dropdown__links-21, +.dropdown .dropdown__links-22 { + grid-column: span 1; + padding-bottom: 8px; + padding-top: 4px; +} + +.dropdown .dropdown__links-21, +.dropdown .dropdown__links-22 { + grid-row: 2; +} + +.dropdown .dropdown__links-21 { + grid-column: 3 / span 1; +} + +.dropdown .dropdown__links-22 { + grid-column: 4 / span 1; +} + +.dropdown .dropdown__links-group { + padding-bottom: 16px; + padding-top: 1px; +} + +.dropdown .dropdown__links-group h3, +.dropdown__links-group li a { + font-size: 19px; + line-height: 24px; + font-weight: 600; + margin-bottom: 0; +} + +.dropdown .dropdown__links-group ul { + list-style: none; + padding-top: 8px; + padding-left: 0; + margin-top: 0; +} + +.dropdown .dropdown__links-group li { + border-top: 1px solid #118acb; + padding-top: 10px; + padding-bottom: 8px; + width: 100% +} + +.dropdown .dropdown__links-group li:last-child { + border-bottom: 1px solid #118acb; +} + +.dropdown .dropdown__links-group li a { + color: #badef5; + width: 100%; + display: inline-block; +} + +.dropdown .dropdown__links-group li a:hover { + color: #6e6e78; +} + +.dropdown .dropdown__featured { + grid-column: span 1; + padding-bottom: 16px; + padding-top: 4px; +} + +.dropdown .dropdown__featured-title { + padding-bottom: 8px; +} + +.dropdown .dropdown__featured-title h3 { + font-size: 19px; + line-height: 24px; + font-weight: 600; + margin-bottom: 0; +} + +.dropdown .dropdown__featured-image { + position: relative; + padding-bottom: 8px; +} + +.dropdown .dropdown__featured-image:before { + display: block; + content: ""; + padding-top: calc((548 / 965) * 100%); + width: 100%; +} + +.dropdown .dropdown__featured-image > div { + position: absolute; + top: 0; + left: 0; + right: 0; + bottom: 0; + overflow: hidden; +} + +.dropdown .dropdown__featured-image img { + position: absolute; + top: 50%; + left: 50%; + transform: translate(-50%, -50%); + min-height: 100%; + min-width: 100%; + object-fit: cover; + width: 100%; +} + +.dropdown .dropdown__featured a { + color: #badef5; +} + +.dropdown .dropdown__featured a:hover { + color: #6e6e78; +} + + + + +/* mobile header */ + +.header--mobile { + background-color: var(--md-primary-fg-color); + color: #e7f6fd; + border-bottom: 1px solid #1d1651; + height: 66px; + font-size: 19px; + line-height: 24px; +} + +@media only screen and (min-width: 961px) { + .header--mobile { display: none; } +} + +.header--mobile .grid{ + position: relative; + display: grid; + grid-template-columns: repeat(6, 1fr); + grid-template-rows: auto; + width: calc(100% - 28px * 2); + margin: 0 auto; + grid-column-gap: 24px; + column-gap: 24px; /* repeated due to change in spec */ + grid-row-gap: 0; + row-gap: 0; /* repeated due to change in spec */ + max-width: 59.8rem; +} + +.header--mobile .header__nav-primary a { + color: #FFF; +} + +.header--mobile .header__nav-primary a span { + position: absolute; + right: 1px; +} + +.header--mobile .header__nav-secondary { + grid-column: span 6; + width: 100%; + display: grid; + grid-template-columns: repeat(6, 1fr); + grid-template-rows: auto; + margin: 0 auto; + grid-column-gap: 24px; + column-gap: 24px; /* repeated due to change in spec */ + grid-row-gap: 0; + row-gap: 0; /* repeated due to change in spec */ + max-width: 59.8rem; +} + +.header--mobile .header__nav-secondary .dropdown__links-1, +.header--mobile .header__nav-secondary .dropdown__links-2 { + grid-column: span 3; +} + +.header--mobile .header__nav-secondary a { + color: #badef5; +} + +.header--mobile .header__nav-secondary a:hover { + color: #118acb; +} + +.header--mobile a:hover { + color: #118acb; +} + +.header--mobile .header__site-title { + grid-column: span 4; + margin-top: 0; +} + +.header--mobile .header__site-title h1 { + color: #b8e2de; + font-size: 19px; + font-weight: 600; + line-height: 24px; + margin-top: 12px; +} + +@media only screen and (max-width: 960px) { + .header--mobile .header__site-title h1 { + font-size: 16px; + line-height: 18px; + margin-top: 14px; + } +} + +.header--mobile .header__mobIcons { + grid-column: span 2; +} + +.header--mobile .header__mobIcons { + grid-column: span 2; +} + +.header--mobile .header__mobIcons svg { + float: right; + height: 48px; + fill: #b8e2de; + margin-top: 8px; + margin-right: -6px; + transition: all .1s; +} + +.header--mobile .header__mobIcons svg:hover { + cursor: pointer; + fill: #118acb; +} + +.md-header__button { + margin-top: 0; + float: right; +} + + +.header--mobile .header__mobIcons .md-header__button svg { + margin-top: 0px; + width: 36px; +} + +.header--mobile .header__mobIcons .md-search__icon svg { + margin-top: -12px; + width: 36px; +} + + + + +.header--mobile .header__nav-primary a { + padding-right: 27px; + font-weight: 600; + display: inline-block; + margin-bottom: 8px; +} + +.header--mobile .header__nav-secondary ul { + padding-right: 16px; + display: inline-block; + position: relative; +} + +.header--mobile .header__nav-secondary li { + font-size: 14px; + line-height: 16px; + font-weight: 800; + padding-right: 16px; + padding-bottom: 16px; + letter-spacing: 1.4px; + text-transform: uppercase; + display: block; +} + +.header--mobile .dropdown { + grid-column: span 5; + width: calc(83.3333% + 24px); + position: absolute; + grid-row: 4; + z-index: 50; +} + +.header--mobile .dropdown nav { + display: grid; + grid-template-columns: repeat(5, 1fr); + /*grid-template-rows: repeat(0, 1fr);*/ + grid-column-gap: 24px; + column-gap: 24px; + grid-row-gap: 0; + row-gap: 0; + grid-column: auto/span 5; + /*grid-row: span $rows;*/ + background-color: #080812; + color: #fff; + @include inset($top: m); + margin-left: -28px; + padding-left: 28px; + padding-right: 24px; +} + +.header--mobile .dropdown .dropdown__links-1, +.header--mobile .dropdown .dropdown__links-2, +.header--mobile .dropdown .dropdown__links-21, +.header--mobile .dropdown .dropdown__links-22 { + grid-column: span 3; + padding-bottom: 8px; + padding-top: 4px; +} + +.header--mobile .dropdown .dropdown__links-21, +.header--mobile .dropdown .dropdown__links-22 { + grid-row: 2; +} + +.header--mobile .dropdown .dropdown__links-group { + padding-bottom: 16px; + padding-top: 1px; +} + +.header--mobile .dropdown .dropdown__links-group h3, +.header--mobile .dropdown__links-group li a { + font-size: 19px; + line-height: 24px; + font-weight: 600; + margin-bottom: 0; +} + +.header--mobile .dropdown .dropdown__links-group ul { + list-style: none; + padding-top: 8px; + padding-left: 0; + margin-top: 0; +} + +.header--mobile .dropdown .dropdown__links-group li { + border-top: 1px solid #118acb; + padding-top: 10px; + padding-bottom: 8px; + width: 100% +} + +.header--mobile .dropdown .dropdown__links-group li:last-child { + border-bottom: 1px solid #118acb; +} + +.header--mobile .dropdown .dropdown__links-group li a { + color: #badef5; + width: 100%; + display: inline-block; +} + +.header--mobile .dropdown .dropdown__links-group li a:hover { + color: #6e6e78; +} + + +.header--mobile .header__dropdownBG { + width: 100%; + top: 0; + bottom: 0; + z-index: 1000000; + position: fixed; + background-color: #080812; + overflow: auto; +} + + +.header--mobile .dropdown { + grid-column: span 6; + position: relative; + width: 100%; + display: grid; + grid-template-columns: repeat(6, 1fr); + grid-column-gap: 24px; + column-gap: 24px; + grid-row-gap: 0; + row-gap: 0; + background-color: #080812; + color: #fff; + border-top: none; +} + + +.header__nav-primary { + grid-column: span 6; + /*width: calc(83.3333% + 24px);*/ + display: grid; + grid-template-columns: repeat(6, 1fr); + grid-column-gap: 24px; + column-gap: 24px; + grid-row-gap: 0; + row-gap: 0; +} + +.header--mobile ul, +.header--mobile ul li { + list-style: none; + padding-left: 0; + grid-column: span 6; +} + +.header--mobile a.drawer-head { + border-bottom: 1px solid #badef5; + width: 100%; + padding-bottom: 8px; +} + +.header--mobile .dropdown .dropdown__links-group h3 { + margin-top: 8px; +} + +.menu--open { + display: block; +} + +.menu--closed { + display: none !important; +} + + +/* footer */ +/* ----------------------------------------------------------------- */ + +footer { + background-color: #080812; + color: #fff; + padding-top: 32px; + min-height: 320px; + font-size: 19px; + line-height: 24px; +} + +footer .grid{ + position: relative; + display: grid; + grid-template-columns: repeat(4, 1fr); + grid-template-rows: 2; + width: calc(100% - 28px * 2); + margin: 0 auto; + grid-column-gap: 24px; + column-gap: 24px; /* repeated due to change in spec */ + grid-row-gap: 0; + row-gap: 0; /* repeated due to change in spec */ + max-width: 59.8rem; +} + +.footer--attr { + grid-column: 1 / span 2; + grid-row: 1; + min-height: 240px; +} + +@media only screen and (max-width: 960px) { + .footer--attr { + grid-column: 1 / span 6; + min-height: 180px; + } +} + +.footer--address { + grid-column: 3 / span 1; + grid-row: 1; +} + +@media only screen and (max-width: 960px) { + .footer--address { + grid-column: 1 / span 6; + grid-row: 2; + } +} + +.footer--admin { + grid-column: 1 / span 1; + grid-row: 2; +} + +@media only screen and (max-width: 960px) { + .footer--admin { + grid-column: 1 / span 6; + grid-row: 3; + } +} + +.footer--copyright { + grid-column: 3 / span 1; + grid-row: 2; +} + +@media only screen and (max-width: 960px) { + .footer--copyright { + grid-column: 1 / span 6; + grid-row: 4; + } +} + +.footer--logos { + grid-column: 4 / span 1; + grid-row: 2; +} + +@media only screen and (max-width: 960px) { + .footer--logos { + grid-column: 1 / span 6; + grid-row: 5; + } +} + +.footer--attr h2 { + font-size: 36px; + line-height: 36px; + margin-top: 0; +} + +.footer--attr h2 span { + display: block; + font-size: 19px; + line-height: 24px; +} + +footer p { + margin-top: 0; +} + +footer a { + color: #badef5; + transition: color 0.2s; +} + + +footer a:hover { + color: #6e6e78; +} + +.footer--logos img { + max-width: 104px; + display: inline; + margin-right: 18px; +} + + +/*remove edit buttton by titles*/ +.md-typeset .md-content__button { + display: none; +} + + +/*// For the js script*/ +.js-dropdown-visible { + display: block; +} + +.js-dropdown-hidden { + display: none; +} diff --git a/ai-testbed/stylesheets/extra.css b/ai-testbed/stylesheets/extra.css new file mode 100644 index 0000000000..dc002c8e76 --- /dev/null +++ b/ai-testbed/stylesheets/extra.css @@ -0,0 +1,54 @@ +/*body { overflow-x: hidden; font-family: 'Proxima Nova'; }*/ +/* Custom colors */ +/*$primary: {{ site.data.style.highlight | default: "#fed136" }} !default; +$white: {{ site.data.style.white | default: "#fff" }} !default; +$black: {{ site.data.style.black | default: "#000" }} !default;*/ +/*.text-primary { color: #009f90 !important; }*/ +header.md-header { background-color: #212529; color: #009f90;} +.md-header-nav__title {font-weight: 600;} +nav.md-tabs { background-color: #212529; color: #b8e2de;} +label.md-search__icon.md-icon {color: #b8e2de;} +a.md-nav__link.md-nav__link--active {color: #009f90;} +/*.md-nav__link.md-nav__link--hover {color: #006b61;}*/ +a {color: #009f90;} +.md-nav__link:focus { + color: #009f90; +} +.md-nav__link:hover {color: #006b61;} +.md-typeset a, a:focus { + color: #009f90; +} +.md-typeset a:hover {color: #006b61;} +html .md-footer-meta.md-typeset a:focus, html .md-footer-meta.md-typeset a {color: #b8e2de;} + +.md-announce, .md-footer, .md-footer-meta {background-color: #212529;} + +.md-header-nav__button.md-logo svg {display: none;} + +/*Hiding everything in left nav except for systems*/ +.md-nav__item:nth-child(1) {display: none;} +.md-nav__item:nth-child(2) {display: none;} +.md-nav__item:nth-child(3) {display: none;} +.md-nav__item:nth-child(7) {display: none;} +.md-nav__item:nth-child(8) {display: none;} + +/*Hide Graphcore in Header Nav*/ +.md-tabs__item:nth-child(4) {display: none;} + +/*Change Header Nav Tabs Color*/ +.md-tabs__link {color: white;} + +/*This is the command to display correct font*/ +body, input { +font-family: Proxima Nova,Montserrat,Arial,sans-serif; +} + +/*Hide Table of Contents*/ +.md-sidebar--secondary { + display: none; +} + +/*.md-header-nav__button.md-logo img { + width: 4.2rem; + height: 1.2rem; +}*/ diff --git a/assets/images/favicon.png b/assets/images/favicon.png new file mode 100644 index 0000000000..1cf13b9f9d Binary files /dev/null and b/assets/images/favicon.png differ diff --git a/assets/javascripts/bundle.7389ff0e.min.js b/assets/javascripts/bundle.7389ff0e.min.js new file mode 100644 index 0000000000..c7df7197e7 --- /dev/null +++ b/assets/javascripts/bundle.7389ff0e.min.js @@ -0,0 +1,29 @@ +"use strict";(()=>{var Mi=Object.create;var gr=Object.defineProperty;var Li=Object.getOwnPropertyDescriptor;var _i=Object.getOwnPropertyNames,Ft=Object.getOwnPropertySymbols,Ai=Object.getPrototypeOf,xr=Object.prototype.hasOwnProperty,ro=Object.prototype.propertyIsEnumerable;var to=(e,t,r)=>t in e?gr(e,t,{enumerable:!0,configurable:!0,writable:!0,value:r}):e[t]=r,P=(e,t)=>{for(var r in t||(t={}))xr.call(t,r)&&to(e,r,t[r]);if(Ft)for(var r of Ft(t))ro.call(t,r)&&to(e,r,t[r]);return e};var oo=(e,t)=>{var r={};for(var o in e)xr.call(e,o)&&t.indexOf(o)<0&&(r[o]=e[o]);if(e!=null&&Ft)for(var o of Ft(e))t.indexOf(o)<0&&ro.call(e,o)&&(r[o]=e[o]);return r};var yr=(e,t)=>()=>(t||e((t={exports:{}}).exports,t),t.exports);var Ci=(e,t,r,o)=>{if(t&&typeof t=="object"||typeof t=="function")for(let n of _i(t))!xr.call(e,n)&&n!==r&&gr(e,n,{get:()=>t[n],enumerable:!(o=Li(t,n))||o.enumerable});return e};var jt=(e,t,r)=>(r=e!=null?Mi(Ai(e)):{},Ci(t||!e||!e.__esModule?gr(r,"default",{value:e,enumerable:!0}):r,e));var no=(e,t,r)=>new Promise((o,n)=>{var i=c=>{try{a(r.next(c))}catch(p){n(p)}},s=c=>{try{a(r.throw(c))}catch(p){n(p)}},a=c=>c.done?o(c.value):Promise.resolve(c.value).then(i,s);a((r=r.apply(e,t)).next())});var ao=yr((Er,io)=>{(function(e,t){typeof Er=="object"&&typeof io!="undefined"?t():typeof define=="function"&&define.amd?define(t):t()})(Er,function(){"use strict";function e(r){var o=!0,n=!1,i=null,s={text:!0,search:!0,url:!0,tel:!0,email:!0,password:!0,number:!0,date:!0,month:!0,week:!0,time:!0,datetime:!0,"datetime-local":!0};function a(C){return!!(C&&C!==document&&C.nodeName!=="HTML"&&C.nodeName!=="BODY"&&"classList"in C&&"contains"in C.classList)}function c(C){var ct=C.type,Ve=C.tagName;return!!(Ve==="INPUT"&&s[ct]&&!C.readOnly||Ve==="TEXTAREA"&&!C.readOnly||C.isContentEditable)}function p(C){C.classList.contains("focus-visible")||(C.classList.add("focus-visible"),C.setAttribute("data-focus-visible-added",""))}function l(C){C.hasAttribute("data-focus-visible-added")&&(C.classList.remove("focus-visible"),C.removeAttribute("data-focus-visible-added"))}function f(C){C.metaKey||C.altKey||C.ctrlKey||(a(r.activeElement)&&p(r.activeElement),o=!0)}function u(C){o=!1}function d(C){a(C.target)&&(o||c(C.target))&&p(C.target)}function y(C){a(C.target)&&(C.target.classList.contains("focus-visible")||C.target.hasAttribute("data-focus-visible-added"))&&(n=!0,window.clearTimeout(i),i=window.setTimeout(function(){n=!1},100),l(C.target))}function b(C){document.visibilityState==="hidden"&&(n&&(o=!0),D())}function D(){document.addEventListener("mousemove",J),document.addEventListener("mousedown",J),document.addEventListener("mouseup",J),document.addEventListener("pointermove",J),document.addEventListener("pointerdown",J),document.addEventListener("pointerup",J),document.addEventListener("touchmove",J),document.addEventListener("touchstart",J),document.addEventListener("touchend",J)}function Q(){document.removeEventListener("mousemove",J),document.removeEventListener("mousedown",J),document.removeEventListener("mouseup",J),document.removeEventListener("pointermove",J),document.removeEventListener("pointerdown",J),document.removeEventListener("pointerup",J),document.removeEventListener("touchmove",J),document.removeEventListener("touchstart",J),document.removeEventListener("touchend",J)}function J(C){C.target.nodeName&&C.target.nodeName.toLowerCase()==="html"||(o=!1,Q())}document.addEventListener("keydown",f,!0),document.addEventListener("mousedown",u,!0),document.addEventListener("pointerdown",u,!0),document.addEventListener("touchstart",u,!0),document.addEventListener("visibilitychange",b,!0),D(),r.addEventListener("focus",d,!0),r.addEventListener("blur",y,!0),r.nodeType===Node.DOCUMENT_FRAGMENT_NODE&&r.host?r.host.setAttribute("data-js-focus-visible",""):r.nodeType===Node.DOCUMENT_NODE&&(document.documentElement.classList.add("js-focus-visible"),document.documentElement.setAttribute("data-js-focus-visible",""))}if(typeof window!="undefined"&&typeof document!="undefined"){window.applyFocusVisiblePolyfill=e;var t;try{t=new CustomEvent("focus-visible-polyfill-ready")}catch(r){t=document.createEvent("CustomEvent"),t.initCustomEvent("focus-visible-polyfill-ready",!1,!1,{})}window.dispatchEvent(t)}typeof document!="undefined"&&e(document)})});var Kr=yr((kt,qr)=>{/*! + * clipboard.js v2.0.11 + * https://clipboardjs.com/ + * + * Licensed MIT © Zeno Rocha + */(function(t,r){typeof kt=="object"&&typeof qr=="object"?qr.exports=r():typeof define=="function"&&define.amd?define([],r):typeof kt=="object"?kt.ClipboardJS=r():t.ClipboardJS=r()})(kt,function(){return function(){var e={686:function(o,n,i){"use strict";i.d(n,{default:function(){return Oi}});var s=i(279),a=i.n(s),c=i(370),p=i.n(c),l=i(817),f=i.n(l);function u(V){try{return document.execCommand(V)}catch(_){return!1}}var d=function(_){var O=f()(_);return u("cut"),O},y=d;function b(V){var _=document.documentElement.getAttribute("dir")==="rtl",O=document.createElement("textarea");O.style.fontSize="12pt",O.style.border="0",O.style.padding="0",O.style.margin="0",O.style.position="absolute",O.style[_?"right":"left"]="-9999px";var $=window.pageYOffset||document.documentElement.scrollTop;return O.style.top="".concat($,"px"),O.setAttribute("readonly",""),O.value=V,O}var D=function(_,O){var $=b(_);O.container.appendChild($);var N=f()($);return u("copy"),$.remove(),N},Q=function(_){var O=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body},$="";return typeof _=="string"?$=D(_,O):_ instanceof HTMLInputElement&&!["text","search","url","tel","password"].includes(_==null?void 0:_.type)?$=D(_.value,O):($=f()(_),u("copy")),$},J=Q;function C(V){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?C=function(O){return typeof O}:C=function(O){return O&&typeof Symbol=="function"&&O.constructor===Symbol&&O!==Symbol.prototype?"symbol":typeof O},C(V)}var ct=function(){var _=arguments.length>0&&arguments[0]!==void 0?arguments[0]:{},O=_.action,$=O===void 0?"copy":O,N=_.container,Y=_.target,ke=_.text;if($!=="copy"&&$!=="cut")throw new Error('Invalid "action" value, use either "copy" or "cut"');if(Y!==void 0)if(Y&&C(Y)==="object"&&Y.nodeType===1){if($==="copy"&&Y.hasAttribute("disabled"))throw new Error('Invalid "target" attribute. Please use "readonly" instead of "disabled" attribute');if($==="cut"&&(Y.hasAttribute("readonly")||Y.hasAttribute("disabled")))throw new Error(`Invalid "target" attribute. You can't cut text from elements with "readonly" or "disabled" attributes`)}else throw new Error('Invalid "target" value, use a valid Element');if(ke)return J(ke,{container:N});if(Y)return $==="cut"?y(Y):J(Y,{container:N})},Ve=ct;function Fe(V){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?Fe=function(O){return typeof O}:Fe=function(O){return O&&typeof Symbol=="function"&&O.constructor===Symbol&&O!==Symbol.prototype?"symbol":typeof O},Fe(V)}function vi(V,_){if(!(V instanceof _))throw new TypeError("Cannot call a class as a function")}function eo(V,_){for(var O=0;O<_.length;O++){var $=_[O];$.enumerable=$.enumerable||!1,$.configurable=!0,"value"in $&&($.writable=!0),Object.defineProperty(V,$.key,$)}}function gi(V,_,O){return _&&eo(V.prototype,_),O&&eo(V,O),V}function xi(V,_){if(typeof _!="function"&&_!==null)throw new TypeError("Super expression must either be null or a function");V.prototype=Object.create(_&&_.prototype,{constructor:{value:V,writable:!0,configurable:!0}}),_&&br(V,_)}function br(V,_){return br=Object.setPrototypeOf||function($,N){return $.__proto__=N,$},br(V,_)}function yi(V){var _=Ti();return function(){var $=Rt(V),N;if(_){var Y=Rt(this).constructor;N=Reflect.construct($,arguments,Y)}else N=$.apply(this,arguments);return Ei(this,N)}}function Ei(V,_){return _&&(Fe(_)==="object"||typeof _=="function")?_:wi(V)}function wi(V){if(V===void 0)throw new ReferenceError("this hasn't been initialised - super() hasn't been called");return V}function Ti(){if(typeof Reflect=="undefined"||!Reflect.construct||Reflect.construct.sham)return!1;if(typeof Proxy=="function")return!0;try{return Date.prototype.toString.call(Reflect.construct(Date,[],function(){})),!0}catch(V){return!1}}function Rt(V){return Rt=Object.setPrototypeOf?Object.getPrototypeOf:function(O){return O.__proto__||Object.getPrototypeOf(O)},Rt(V)}function vr(V,_){var O="data-clipboard-".concat(V);if(_.hasAttribute(O))return _.getAttribute(O)}var Si=function(V){xi(O,V);var _=yi(O);function O($,N){var Y;return vi(this,O),Y=_.call(this),Y.resolveOptions(N),Y.listenClick($),Y}return gi(O,[{key:"resolveOptions",value:function(){var N=arguments.length>0&&arguments[0]!==void 0?arguments[0]:{};this.action=typeof N.action=="function"?N.action:this.defaultAction,this.target=typeof N.target=="function"?N.target:this.defaultTarget,this.text=typeof N.text=="function"?N.text:this.defaultText,this.container=Fe(N.container)==="object"?N.container:document.body}},{key:"listenClick",value:function(N){var Y=this;this.listener=p()(N,"click",function(ke){return Y.onClick(ke)})}},{key:"onClick",value:function(N){var Y=N.delegateTarget||N.currentTarget,ke=this.action(Y)||"copy",It=Ve({action:ke,container:this.container,target:this.target(Y),text:this.text(Y)});this.emit(It?"success":"error",{action:ke,text:It,trigger:Y,clearSelection:function(){Y&&Y.focus(),window.getSelection().removeAllRanges()}})}},{key:"defaultAction",value:function(N){return vr("action",N)}},{key:"defaultTarget",value:function(N){var Y=vr("target",N);if(Y)return document.querySelector(Y)}},{key:"defaultText",value:function(N){return vr("text",N)}},{key:"destroy",value:function(){this.listener.destroy()}}],[{key:"copy",value:function(N){var Y=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body};return J(N,Y)}},{key:"cut",value:function(N){return y(N)}},{key:"isSupported",value:function(){var N=arguments.length>0&&arguments[0]!==void 0?arguments[0]:["copy","cut"],Y=typeof N=="string"?[N]:N,ke=!!document.queryCommandSupported;return Y.forEach(function(It){ke=ke&&!!document.queryCommandSupported(It)}),ke}}]),O}(a()),Oi=Si},828:function(o){var n=9;if(typeof Element!="undefined"&&!Element.prototype.matches){var i=Element.prototype;i.matches=i.matchesSelector||i.mozMatchesSelector||i.msMatchesSelector||i.oMatchesSelector||i.webkitMatchesSelector}function s(a,c){for(;a&&a.nodeType!==n;){if(typeof a.matches=="function"&&a.matches(c))return a;a=a.parentNode}}o.exports=s},438:function(o,n,i){var s=i(828);function a(l,f,u,d,y){var b=p.apply(this,arguments);return l.addEventListener(u,b,y),{destroy:function(){l.removeEventListener(u,b,y)}}}function c(l,f,u,d,y){return typeof l.addEventListener=="function"?a.apply(null,arguments):typeof u=="function"?a.bind(null,document).apply(null,arguments):(typeof l=="string"&&(l=document.querySelectorAll(l)),Array.prototype.map.call(l,function(b){return a(b,f,u,d,y)}))}function p(l,f,u,d){return function(y){y.delegateTarget=s(y.target,f),y.delegateTarget&&d.call(l,y)}}o.exports=c},879:function(o,n){n.node=function(i){return i!==void 0&&i instanceof HTMLElement&&i.nodeType===1},n.nodeList=function(i){var s=Object.prototype.toString.call(i);return i!==void 0&&(s==="[object NodeList]"||s==="[object HTMLCollection]")&&"length"in i&&(i.length===0||n.node(i[0]))},n.string=function(i){return typeof i=="string"||i instanceof String},n.fn=function(i){var s=Object.prototype.toString.call(i);return s==="[object Function]"}},370:function(o,n,i){var s=i(879),a=i(438);function c(u,d,y){if(!u&&!d&&!y)throw new Error("Missing required arguments");if(!s.string(d))throw new TypeError("Second argument must be a String");if(!s.fn(y))throw new TypeError("Third argument must be a Function");if(s.node(u))return p(u,d,y);if(s.nodeList(u))return l(u,d,y);if(s.string(u))return f(u,d,y);throw new TypeError("First argument must be a String, HTMLElement, HTMLCollection, or NodeList")}function p(u,d,y){return u.addEventListener(d,y),{destroy:function(){u.removeEventListener(d,y)}}}function l(u,d,y){return Array.prototype.forEach.call(u,function(b){b.addEventListener(d,y)}),{destroy:function(){Array.prototype.forEach.call(u,function(b){b.removeEventListener(d,y)})}}}function f(u,d,y){return a(document.body,u,d,y)}o.exports=c},817:function(o){function n(i){var s;if(i.nodeName==="SELECT")i.focus(),s=i.value;else if(i.nodeName==="INPUT"||i.nodeName==="TEXTAREA"){var a=i.hasAttribute("readonly");a||i.setAttribute("readonly",""),i.select(),i.setSelectionRange(0,i.value.length),a||i.removeAttribute("readonly"),s=i.value}else{i.hasAttribute("contenteditable")&&i.focus();var c=window.getSelection(),p=document.createRange();p.selectNodeContents(i),c.removeAllRanges(),c.addRange(p),s=c.toString()}return s}o.exports=n},279:function(o){function n(){}n.prototype={on:function(i,s,a){var c=this.e||(this.e={});return(c[i]||(c[i]=[])).push({fn:s,ctx:a}),this},once:function(i,s,a){var c=this;function p(){c.off(i,p),s.apply(a,arguments)}return p._=s,this.on(i,p,a)},emit:function(i){var s=[].slice.call(arguments,1),a=((this.e||(this.e={}))[i]||[]).slice(),c=0,p=a.length;for(c;c{"use strict";/*! + * escape-html + * Copyright(c) 2012-2013 TJ Holowaychuk + * Copyright(c) 2015 Andreas Lubbe + * Copyright(c) 2015 Tiancheng "Timothy" Gu + * MIT Licensed + */var Wa=/["'&<>]/;Vn.exports=Ua;function Ua(e){var t=""+e,r=Wa.exec(t);if(!r)return t;var o,n="",i=0,s=0;for(i=r.index;i0&&i[i.length-1])&&(p[0]===6||p[0]===2)){r=0;continue}if(p[0]===3&&(!i||p[1]>i[0]&&p[1]=e.length&&(e=void 0),{value:e&&e[o++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function z(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var o=r.call(e),n,i=[],s;try{for(;(t===void 0||t-- >0)&&!(n=o.next()).done;)i.push(n.value)}catch(a){s={error:a}}finally{try{n&&!n.done&&(r=o.return)&&r.call(o)}finally{if(s)throw s.error}}return i}function K(e,t,r){if(r||arguments.length===2)for(var o=0,n=t.length,i;o1||a(u,d)})})}function a(u,d){try{c(o[u](d))}catch(y){f(i[0][3],y)}}function c(u){u.value instanceof ot?Promise.resolve(u.value.v).then(p,l):f(i[0][2],u)}function p(u){a("next",u)}function l(u){a("throw",u)}function f(u,d){u(d),i.shift(),i.length&&a(i[0][0],i[0][1])}}function po(e){if(!Symbol.asyncIterator)throw new TypeError("Symbol.asyncIterator is not defined.");var t=e[Symbol.asyncIterator],r;return t?t.call(e):(e=typeof be=="function"?be(e):e[Symbol.iterator](),r={},o("next"),o("throw"),o("return"),r[Symbol.asyncIterator]=function(){return this},r);function o(i){r[i]=e[i]&&function(s){return new Promise(function(a,c){s=e[i](s),n(a,c,s.done,s.value)})}}function n(i,s,a,c){Promise.resolve(c).then(function(p){i({value:p,done:a})},s)}}function k(e){return typeof e=="function"}function pt(e){var t=function(o){Error.call(o),o.stack=new Error().stack},r=e(t);return r.prototype=Object.create(Error.prototype),r.prototype.constructor=r,r}var Ut=pt(function(e){return function(r){e(this),this.message=r?r.length+` errors occurred during unsubscription: +`+r.map(function(o,n){return n+1+") "+o.toString()}).join(` + `):"",this.name="UnsubscriptionError",this.errors=r}});function ze(e,t){if(e){var r=e.indexOf(t);0<=r&&e.splice(r,1)}}var je=function(){function e(t){this.initialTeardown=t,this.closed=!1,this._parentage=null,this._finalizers=null}return e.prototype.unsubscribe=function(){var t,r,o,n,i;if(!this.closed){this.closed=!0;var s=this._parentage;if(s)if(this._parentage=null,Array.isArray(s))try{for(var a=be(s),c=a.next();!c.done;c=a.next()){var p=c.value;p.remove(this)}}catch(b){t={error:b}}finally{try{c&&!c.done&&(r=a.return)&&r.call(a)}finally{if(t)throw t.error}}else s.remove(this);var l=this.initialTeardown;if(k(l))try{l()}catch(b){i=b instanceof Ut?b.errors:[b]}var f=this._finalizers;if(f){this._finalizers=null;try{for(var u=be(f),d=u.next();!d.done;d=u.next()){var y=d.value;try{lo(y)}catch(b){i=i!=null?i:[],b instanceof Ut?i=K(K([],z(i)),z(b.errors)):i.push(b)}}}catch(b){o={error:b}}finally{try{d&&!d.done&&(n=u.return)&&n.call(u)}finally{if(o)throw o.error}}}if(i)throw new Ut(i)}},e.prototype.add=function(t){var r;if(t&&t!==this)if(this.closed)lo(t);else{if(t instanceof e){if(t.closed||t._hasParent(this))return;t._addParent(this)}(this._finalizers=(r=this._finalizers)!==null&&r!==void 0?r:[]).push(t)}},e.prototype._hasParent=function(t){var r=this._parentage;return r===t||Array.isArray(r)&&r.includes(t)},e.prototype._addParent=function(t){var r=this._parentage;this._parentage=Array.isArray(r)?(r.push(t),r):r?[r,t]:t},e.prototype._removeParent=function(t){var r=this._parentage;r===t?this._parentage=null:Array.isArray(r)&&ze(r,t)},e.prototype.remove=function(t){var r=this._finalizers;r&&ze(r,t),t instanceof e&&t._removeParent(this)},e.EMPTY=function(){var t=new e;return t.closed=!0,t}(),e}();var Tr=je.EMPTY;function Nt(e){return e instanceof je||e&&"closed"in e&&k(e.remove)&&k(e.add)&&k(e.unsubscribe)}function lo(e){k(e)?e():e.unsubscribe()}var He={onUnhandledError:null,onStoppedNotification:null,Promise:void 0,useDeprecatedSynchronousErrorHandling:!1,useDeprecatedNextContext:!1};var lt={setTimeout:function(e,t){for(var r=[],o=2;o0},enumerable:!1,configurable:!0}),t.prototype._trySubscribe=function(r){return this._throwIfClosed(),e.prototype._trySubscribe.call(this,r)},t.prototype._subscribe=function(r){return this._throwIfClosed(),this._checkFinalizedStatuses(r),this._innerSubscribe(r)},t.prototype._innerSubscribe=function(r){var o=this,n=this,i=n.hasError,s=n.isStopped,a=n.observers;return i||s?Tr:(this.currentObservers=null,a.push(r),new je(function(){o.currentObservers=null,ze(a,r)}))},t.prototype._checkFinalizedStatuses=function(r){var o=this,n=o.hasError,i=o.thrownError,s=o.isStopped;n?r.error(i):s&&r.complete()},t.prototype.asObservable=function(){var r=new I;return r.source=this,r},t.create=function(r,o){return new xo(r,o)},t}(I);var xo=function(e){se(t,e);function t(r,o){var n=e.call(this)||this;return n.destination=r,n.source=o,n}return t.prototype.next=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.next)===null||n===void 0||n.call(o,r)},t.prototype.error=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.error)===null||n===void 0||n.call(o,r)},t.prototype.complete=function(){var r,o;(o=(r=this.destination)===null||r===void 0?void 0:r.complete)===null||o===void 0||o.call(r)},t.prototype._subscribe=function(r){var o,n;return(n=(o=this.source)===null||o===void 0?void 0:o.subscribe(r))!==null&&n!==void 0?n:Tr},t}(x);var St={now:function(){return(St.delegate||Date).now()},delegate:void 0};var Ot=function(e){se(t,e);function t(r,o,n){r===void 0&&(r=1/0),o===void 0&&(o=1/0),n===void 0&&(n=St);var i=e.call(this)||this;return i._bufferSize=r,i._windowTime=o,i._timestampProvider=n,i._buffer=[],i._infiniteTimeWindow=!0,i._infiniteTimeWindow=o===1/0,i._bufferSize=Math.max(1,r),i._windowTime=Math.max(1,o),i}return t.prototype.next=function(r){var o=this,n=o.isStopped,i=o._buffer,s=o._infiniteTimeWindow,a=o._timestampProvider,c=o._windowTime;n||(i.push(r),!s&&i.push(a.now()+c)),this._trimBuffer(),e.prototype.next.call(this,r)},t.prototype._subscribe=function(r){this._throwIfClosed(),this._trimBuffer();for(var o=this._innerSubscribe(r),n=this,i=n._infiniteTimeWindow,s=n._buffer,a=s.slice(),c=0;c0?e.prototype.requestAsyncId.call(this,r,o,n):(r.actions.push(this),r._scheduled||(r._scheduled=ut.requestAnimationFrame(function(){return r.flush(void 0)})))},t.prototype.recycleAsyncId=function(r,o,n){var i;if(n===void 0&&(n=0),n!=null?n>0:this.delay>0)return e.prototype.recycleAsyncId.call(this,r,o,n);var s=r.actions;o!=null&&((i=s[s.length-1])===null||i===void 0?void 0:i.id)!==o&&(ut.cancelAnimationFrame(o),r._scheduled=void 0)},t}(zt);var wo=function(e){se(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t.prototype.flush=function(r){this._active=!0;var o=this._scheduled;this._scheduled=void 0;var n=this.actions,i;r=r||n.shift();do if(i=r.execute(r.state,r.delay))break;while((r=n[0])&&r.id===o&&n.shift());if(this._active=!1,i){for(;(r=n[0])&&r.id===o&&n.shift();)r.unsubscribe();throw i}},t}(qt);var ge=new wo(Eo);var M=new I(function(e){return e.complete()});function Kt(e){return e&&k(e.schedule)}function Cr(e){return e[e.length-1]}function Ge(e){return k(Cr(e))?e.pop():void 0}function Ae(e){return Kt(Cr(e))?e.pop():void 0}function Qt(e,t){return typeof Cr(e)=="number"?e.pop():t}var dt=function(e){return e&&typeof e.length=="number"&&typeof e!="function"};function Yt(e){return k(e==null?void 0:e.then)}function Bt(e){return k(e[ft])}function Gt(e){return Symbol.asyncIterator&&k(e==null?void 0:e[Symbol.asyncIterator])}function Jt(e){return new TypeError("You provided "+(e!==null&&typeof e=="object"?"an invalid object":"'"+e+"'")+" where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.")}function Wi(){return typeof Symbol!="function"||!Symbol.iterator?"@@iterator":Symbol.iterator}var Xt=Wi();function Zt(e){return k(e==null?void 0:e[Xt])}function er(e){return co(this,arguments,function(){var r,o,n,i;return Wt(this,function(s){switch(s.label){case 0:r=e.getReader(),s.label=1;case 1:s.trys.push([1,,9,10]),s.label=2;case 2:return[4,ot(r.read())];case 3:return o=s.sent(),n=o.value,i=o.done,i?[4,ot(void 0)]:[3,5];case 4:return[2,s.sent()];case 5:return[4,ot(n)];case 6:return[4,s.sent()];case 7:return s.sent(),[3,2];case 8:return[3,10];case 9:return r.releaseLock(),[7];case 10:return[2]}})})}function tr(e){return k(e==null?void 0:e.getReader)}function F(e){if(e instanceof I)return e;if(e!=null){if(Bt(e))return Ui(e);if(dt(e))return Ni(e);if(Yt(e))return Di(e);if(Gt(e))return To(e);if(Zt(e))return Vi(e);if(tr(e))return zi(e)}throw Jt(e)}function Ui(e){return new I(function(t){var r=e[ft]();if(k(r.subscribe))return r.subscribe(t);throw new TypeError("Provided object does not correctly implement Symbol.observable")})}function Ni(e){return new I(function(t){for(var r=0;r=2;return function(o){return o.pipe(e?v(function(n,i){return e(n,i,o)}):pe,ue(1),r?$e(t):Uo(function(){return new or}))}}function Rr(e){return e<=0?function(){return M}:g(function(t,r){var o=[];t.subscribe(E(r,function(n){o.push(n),e=2,!0))}function de(e){e===void 0&&(e={});var t=e.connector,r=t===void 0?function(){return new x}:t,o=e.resetOnError,n=o===void 0?!0:o,i=e.resetOnComplete,s=i===void 0?!0:i,a=e.resetOnRefCountZero,c=a===void 0?!0:a;return function(p){var l,f,u,d=0,y=!1,b=!1,D=function(){f==null||f.unsubscribe(),f=void 0},Q=function(){D(),l=u=void 0,y=b=!1},J=function(){var C=l;Q(),C==null||C.unsubscribe()};return g(function(C,ct){d++,!b&&!y&&D();var Ve=u=u!=null?u:r();ct.add(function(){d--,d===0&&!b&&!y&&(f=jr(J,c))}),Ve.subscribe(ct),!l&&d>0&&(l=new it({next:function(Fe){return Ve.next(Fe)},error:function(Fe){b=!0,D(),f=jr(Q,n,Fe),Ve.error(Fe)},complete:function(){y=!0,D(),f=jr(Q,s),Ve.complete()}}),F(C).subscribe(l))})(p)}}function jr(e,t){for(var r=[],o=2;oe.next(document)),e}function W(e,t=document){return Array.from(t.querySelectorAll(e))}function U(e,t=document){let r=ce(e,t);if(typeof r=="undefined")throw new ReferenceError(`Missing element: expected "${e}" to be present`);return r}function ce(e,t=document){return t.querySelector(e)||void 0}function Ie(){return document.activeElement instanceof HTMLElement&&document.activeElement||void 0}var ca=L(h(document.body,"focusin"),h(document.body,"focusout")).pipe(ye(1),q(void 0),m(()=>Ie()||document.body),Z(1));function vt(e){return ca.pipe(m(t=>e.contains(t)),X())}function qo(e,t){return L(h(e,"mouseenter").pipe(m(()=>!0)),h(e,"mouseleave").pipe(m(()=>!1))).pipe(t?ye(t):pe,q(!1))}function Ue(e){return{x:e.offsetLeft,y:e.offsetTop}}function Ko(e){return L(h(window,"load"),h(window,"resize")).pipe(Le(0,ge),m(()=>Ue(e)),q(Ue(e)))}function ir(e){return{x:e.scrollLeft,y:e.scrollTop}}function et(e){return L(h(e,"scroll"),h(window,"resize")).pipe(Le(0,ge),m(()=>ir(e)),q(ir(e)))}function Qo(e,t){if(typeof t=="string"||typeof t=="number")e.innerHTML+=t.toString();else if(t instanceof Node)e.appendChild(t);else if(Array.isArray(t))for(let r of t)Qo(e,r)}function S(e,t,...r){let o=document.createElement(e);if(t)for(let n of Object.keys(t))typeof t[n]!="undefined"&&(typeof t[n]!="boolean"?o.setAttribute(n,t[n]):o.setAttribute(n,""));for(let n of r)Qo(o,n);return o}function ar(e){if(e>999){let t=+((e-950)%1e3>99);return`${((e+1e-6)/1e3).toFixed(t)}k`}else return e.toString()}function gt(e){let t=S("script",{src:e});return H(()=>(document.head.appendChild(t),L(h(t,"load"),h(t,"error").pipe(w(()=>kr(()=>new ReferenceError(`Invalid script: ${e}`))))).pipe(m(()=>{}),A(()=>document.head.removeChild(t)),ue(1))))}var Yo=new x,pa=H(()=>typeof ResizeObserver=="undefined"?gt("https://unpkg.com/resize-observer-polyfill"):R(void 0)).pipe(m(()=>new ResizeObserver(e=>{for(let t of e)Yo.next(t)})),w(e=>L(Ke,R(e)).pipe(A(()=>e.disconnect()))),Z(1));function le(e){return{width:e.offsetWidth,height:e.offsetHeight}}function Se(e){return pa.pipe(T(t=>t.observe(e)),w(t=>Yo.pipe(v(({target:r})=>r===e),A(()=>t.unobserve(e)),m(()=>le(e)))),q(le(e)))}function xt(e){return{width:e.scrollWidth,height:e.scrollHeight}}function sr(e){let t=e.parentElement;for(;t&&(e.scrollWidth<=t.scrollWidth&&e.scrollHeight<=t.scrollHeight);)t=(e=t).parentElement;return t?e:void 0}var Bo=new x,la=H(()=>R(new IntersectionObserver(e=>{for(let t of e)Bo.next(t)},{threshold:0}))).pipe(w(e=>L(Ke,R(e)).pipe(A(()=>e.disconnect()))),Z(1));function yt(e){return la.pipe(T(t=>t.observe(e)),w(t=>Bo.pipe(v(({target:r})=>r===e),A(()=>t.unobserve(e)),m(({isIntersecting:r})=>r))))}function Go(e,t=16){return et(e).pipe(m(({y:r})=>{let o=le(e),n=xt(e);return r>=n.height-o.height-t}),X())}var cr={drawer:U("[data-md-toggle=drawer]"),search:U("[data-md-toggle=search]")};function Jo(e){return cr[e].checked}function Ye(e,t){cr[e].checked!==t&&cr[e].click()}function Ne(e){let t=cr[e];return h(t,"change").pipe(m(()=>t.checked),q(t.checked))}function ma(e,t){switch(e.constructor){case HTMLInputElement:return e.type==="radio"?/^Arrow/.test(t):!0;case HTMLSelectElement:case HTMLTextAreaElement:return!0;default:return e.isContentEditable}}function fa(){return L(h(window,"compositionstart").pipe(m(()=>!0)),h(window,"compositionend").pipe(m(()=>!1))).pipe(q(!1))}function Xo(){let e=h(window,"keydown").pipe(v(t=>!(t.metaKey||t.ctrlKey)),m(t=>({mode:Jo("search")?"search":"global",type:t.key,claim(){t.preventDefault(),t.stopPropagation()}})),v(({mode:t,type:r})=>{if(t==="global"){let o=Ie();if(typeof o!="undefined")return!ma(o,r)}return!0}),de());return fa().pipe(w(t=>t?M:e))}function me(){return new URL(location.href)}function st(e,t=!1){if(G("navigation.instant")&&!t){let r=S("a",{href:e.href});document.body.appendChild(r),r.click(),r.remove()}else location.href=e.href}function Zo(){return new x}function en(){return location.hash.slice(1)}function pr(e){let t=S("a",{href:e});t.addEventListener("click",r=>r.stopPropagation()),t.click()}function ua(e){return L(h(window,"hashchange"),e).pipe(m(en),q(en()),v(t=>t.length>0),Z(1))}function tn(e){return ua(e).pipe(m(t=>ce(`[id="${t}"]`)),v(t=>typeof t!="undefined"))}function At(e){let t=matchMedia(e);return nr(r=>t.addListener(()=>r(t.matches))).pipe(q(t.matches))}function rn(){let e=matchMedia("print");return L(h(window,"beforeprint").pipe(m(()=>!0)),h(window,"afterprint").pipe(m(()=>!1))).pipe(q(e.matches))}function Dr(e,t){return e.pipe(w(r=>r?t():M))}function lr(e,t){return new I(r=>{let o=new XMLHttpRequest;o.open("GET",`${e}`),o.responseType="blob",o.addEventListener("load",()=>{o.status>=200&&o.status<300?(r.next(o.response),r.complete()):r.error(new Error(o.statusText))}),o.addEventListener("error",()=>{r.error(new Error("Network Error"))}),o.addEventListener("abort",()=>{r.error(new Error("Request aborted"))}),typeof(t==null?void 0:t.progress$)!="undefined"&&(o.addEventListener("progress",n=>{if(n.lengthComputable)t.progress$.next(n.loaded/n.total*100);else{let i=Number(o.getResponseHeader("Content-Length"))||0;t.progress$.next(n.loaded/i*100)}}),t.progress$.next(5)),o.send()})}function De(e,t){return lr(e,t).pipe(w(r=>r.text()),m(r=>JSON.parse(r)),Z(1))}function on(e,t){let r=new DOMParser;return lr(e,t).pipe(w(o=>o.text()),m(o=>r.parseFromString(o,"text/xml")),Z(1))}function nn(){return{x:Math.max(0,scrollX),y:Math.max(0,scrollY)}}function an(){return L(h(window,"scroll",{passive:!0}),h(window,"resize",{passive:!0})).pipe(m(nn),q(nn()))}function sn(){return{width:innerWidth,height:innerHeight}}function cn(){return h(window,"resize",{passive:!0}).pipe(m(sn),q(sn()))}function pn(){return B([an(),cn()]).pipe(m(([e,t])=>({offset:e,size:t})),Z(1))}function mr(e,{viewport$:t,header$:r}){let o=t.pipe(te("size")),n=B([o,r]).pipe(m(()=>Ue(e)));return B([r,t,n]).pipe(m(([{height:i},{offset:s,size:a},{x:c,y:p}])=>({offset:{x:s.x-c,y:s.y-p+i},size:a})))}function da(e){return h(e,"message",t=>t.data)}function ha(e){let t=new x;return t.subscribe(r=>e.postMessage(r)),t}function ln(e,t=new Worker(e)){let r=da(t),o=ha(t),n=new x;n.subscribe(o);let i=o.pipe(ee(),oe(!0));return n.pipe(ee(),Re(r.pipe(j(i))),de())}var ba=U("#__config"),Et=JSON.parse(ba.textContent);Et.base=`${new URL(Et.base,me())}`;function he(){return Et}function G(e){return Et.features.includes(e)}function we(e,t){return typeof t!="undefined"?Et.translations[e].replace("#",t.toString()):Et.translations[e]}function Oe(e,t=document){return U(`[data-md-component=${e}]`,t)}function ne(e,t=document){return W(`[data-md-component=${e}]`,t)}function va(e){let t=U(".md-typeset > :first-child",e);return h(t,"click",{once:!0}).pipe(m(()=>U(".md-typeset",e)),m(r=>({hash:__md_hash(r.innerHTML)})))}function mn(e){if(!G("announce.dismiss")||!e.childElementCount)return M;if(!e.hidden){let t=U(".md-typeset",e);__md_hash(t.innerHTML)===__md_get("__announce")&&(e.hidden=!0)}return H(()=>{let t=new x;return t.subscribe(({hash:r})=>{e.hidden=!0,__md_set("__announce",r)}),va(e).pipe(T(r=>t.next(r)),A(()=>t.complete()),m(r=>P({ref:e},r)))})}function ga(e,{target$:t}){return t.pipe(m(r=>({hidden:r!==e})))}function fn(e,t){let r=new x;return r.subscribe(({hidden:o})=>{e.hidden=o}),ga(e,t).pipe(T(o=>r.next(o)),A(()=>r.complete()),m(o=>P({ref:e},o)))}function Ct(e,t){return t==="inline"?S("div",{class:"md-tooltip md-tooltip--inline",id:e,role:"tooltip"},S("div",{class:"md-tooltip__inner md-typeset"})):S("div",{class:"md-tooltip",id:e,role:"tooltip"},S("div",{class:"md-tooltip__inner md-typeset"}))}function un(e,t){if(t=t?`${t}_annotation_${e}`:void 0,t){let r=t?`#${t}`:void 0;return S("aside",{class:"md-annotation",tabIndex:0},Ct(t),S("a",{href:r,class:"md-annotation__index",tabIndex:-1},S("span",{"data-md-annotation-id":e})))}else return S("aside",{class:"md-annotation",tabIndex:0},Ct(t),S("span",{class:"md-annotation__index",tabIndex:-1},S("span",{"data-md-annotation-id":e})))}function dn(e){return S("button",{class:"md-clipboard md-icon",title:we("clipboard.copy"),"data-clipboard-target":`#${e} > code`})}function Vr(e,t){let r=t&2,o=t&1,n=Object.keys(e.terms).filter(c=>!e.terms[c]).reduce((c,p)=>[...c,S("del",null,p)," "],[]).slice(0,-1),i=he(),s=new URL(e.location,i.base);G("search.highlight")&&s.searchParams.set("h",Object.entries(e.terms).filter(([,c])=>c).reduce((c,[p])=>`${c} ${p}`.trim(),""));let{tags:a}=he();return S("a",{href:`${s}`,class:"md-search-result__link",tabIndex:-1},S("article",{class:"md-search-result__article md-typeset","data-md-score":e.score.toFixed(2)},r>0&&S("div",{class:"md-search-result__icon md-icon"}),r>0&&S("h1",null,e.title),r<=0&&S("h2",null,e.title),o>0&&e.text.length>0&&e.text,e.tags&&e.tags.map(c=>{let p=a?c in a?`md-tag-icon md-tag--${a[c]}`:"md-tag-icon":"";return S("span",{class:`md-tag ${p}`},c)}),o>0&&n.length>0&&S("p",{class:"md-search-result__terms"},we("search.result.term.missing"),": ",...n)))}function hn(e){let t=e[0].score,r=[...e],o=he(),n=r.findIndex(l=>!`${new URL(l.location,o.base)}`.includes("#")),[i]=r.splice(n,1),s=r.findIndex(l=>l.scoreVr(l,1)),...c.length?[S("details",{class:"md-search-result__more"},S("summary",{tabIndex:-1},S("div",null,c.length>0&&c.length===1?we("search.result.more.one"):we("search.result.more.other",c.length))),...c.map(l=>Vr(l,1)))]:[]];return S("li",{class:"md-search-result__item"},p)}function bn(e){return S("ul",{class:"md-source__facts"},Object.entries(e).map(([t,r])=>S("li",{class:`md-source__fact md-source__fact--${t}`},typeof r=="number"?ar(r):r)))}function zr(e){let t=`tabbed-control tabbed-control--${e}`;return S("div",{class:t,hidden:!0},S("button",{class:"tabbed-button",tabIndex:-1,"aria-hidden":"true"}))}function vn(e){return S("div",{class:"md-typeset__scrollwrap"},S("div",{class:"md-typeset__table"},e))}function xa(e){let t=he(),r=new URL(`../${e.version}/`,t.base);return S("li",{class:"md-version__item"},S("a",{href:`${r}`,class:"md-version__link"},e.title))}function gn(e,t){return S("div",{class:"md-version"},S("button",{class:"md-version__current","aria-label":we("select.version")},t.title),S("ul",{class:"md-version__list"},e.map(xa)))}var ya=0;function Ea(e,t){document.body.append(e);let{width:r}=le(e);e.style.setProperty("--md-tooltip-width",`${r}px`),e.remove();let o=sr(t),n=typeof o!="undefined"?et(o):R({x:0,y:0}),i=L(vt(t),qo(t)).pipe(X());return B([i,n]).pipe(m(([s,a])=>{let{x:c,y:p}=Ue(t),l=le(t),f=t.closest("table");return f&&t.parentElement&&(c+=f.offsetLeft+t.parentElement.offsetLeft,p+=f.offsetTop+t.parentElement.offsetTop),{active:s,offset:{x:c-a.x+l.width/2-r/2,y:p-a.y+l.height+8}}}))}function Be(e){let t=e.title;if(!t.length)return M;let r=`__tooltip_${ya++}`,o=Ct(r,"inline"),n=U(".md-typeset",o);return n.innerHTML=t,H(()=>{let i=new x;return i.subscribe({next({offset:s}){o.style.setProperty("--md-tooltip-x",`${s.x}px`),o.style.setProperty("--md-tooltip-y",`${s.y}px`)},complete(){o.style.removeProperty("--md-tooltip-x"),o.style.removeProperty("--md-tooltip-y")}}),L(i.pipe(v(({active:s})=>s)),i.pipe(ye(250),v(({active:s})=>!s))).subscribe({next({active:s}){s?(e.insertAdjacentElement("afterend",o),e.setAttribute("aria-describedby",r),e.removeAttribute("title")):(o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t))},complete(){o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t)}}),i.pipe(Le(16,ge)).subscribe(({active:s})=>{o.classList.toggle("md-tooltip--active",s)}),i.pipe(_t(125,ge),v(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:s})=>s)).subscribe({next(s){s?o.style.setProperty("--md-tooltip-0",`${-s}px`):o.style.removeProperty("--md-tooltip-0")},complete(){o.style.removeProperty("--md-tooltip-0")}}),Ea(o,e).pipe(T(s=>i.next(s)),A(()=>i.complete()),m(s=>P({ref:e},s)))}).pipe(qe(ie))}function wa(e,t){let r=H(()=>B([Ko(e),et(t)])).pipe(m(([{x:o,y:n},i])=>{let{width:s,height:a}=le(e);return{x:o-i.x+s/2,y:n-i.y+a/2}}));return vt(e).pipe(w(o=>r.pipe(m(n=>({active:o,offset:n})),ue(+!o||1/0))))}function xn(e,t,{target$:r}){let[o,n]=Array.from(e.children);return H(()=>{let i=new x,s=i.pipe(ee(),oe(!0));return i.subscribe({next({offset:a}){e.style.setProperty("--md-tooltip-x",`${a.x}px`),e.style.setProperty("--md-tooltip-y",`${a.y}px`)},complete(){e.style.removeProperty("--md-tooltip-x"),e.style.removeProperty("--md-tooltip-y")}}),yt(e).pipe(j(s)).subscribe(a=>{e.toggleAttribute("data-md-visible",a)}),L(i.pipe(v(({active:a})=>a)),i.pipe(ye(250),v(({active:a})=>!a))).subscribe({next({active:a}){a?e.prepend(o):o.remove()},complete(){e.prepend(o)}}),i.pipe(Le(16,ge)).subscribe(({active:a})=>{o.classList.toggle("md-tooltip--active",a)}),i.pipe(_t(125,ge),v(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:a})=>a)).subscribe({next(a){a?e.style.setProperty("--md-tooltip-0",`${-a}px`):e.style.removeProperty("--md-tooltip-0")},complete(){e.style.removeProperty("--md-tooltip-0")}}),h(n,"click").pipe(j(s),v(a=>!(a.metaKey||a.ctrlKey))).subscribe(a=>{a.stopPropagation(),a.preventDefault()}),h(n,"mousedown").pipe(j(s),ae(i)).subscribe(([a,{active:c}])=>{var p;if(a.button!==0||a.metaKey||a.ctrlKey)a.preventDefault();else if(c){a.preventDefault();let l=e.parentElement.closest(".md-annotation");l instanceof HTMLElement?l.focus():(p=Ie())==null||p.blur()}}),r.pipe(j(s),v(a=>a===o),Qe(125)).subscribe(()=>e.focus()),wa(e,t).pipe(T(a=>i.next(a)),A(()=>i.complete()),m(a=>P({ref:e},a)))})}function Ta(e){return e.tagName==="CODE"?W(".c, .c1, .cm",e):[e]}function Sa(e){let t=[];for(let r of Ta(e)){let o=[],n=document.createNodeIterator(r,NodeFilter.SHOW_TEXT);for(let i=n.nextNode();i;i=n.nextNode())o.push(i);for(let i of o){let s;for(;s=/(\(\d+\))(!)?/.exec(i.textContent);){let[,a,c]=s;if(typeof c=="undefined"){let p=i.splitText(s.index);i=p.splitText(a.length),t.push(p)}else{i.textContent=a,t.push(i);break}}}}return t}function yn(e,t){t.append(...Array.from(e.childNodes))}function fr(e,t,{target$:r,print$:o}){let n=t.closest("[id]"),i=n==null?void 0:n.id,s=new Map;for(let a of Sa(t)){let[,c]=a.textContent.match(/\((\d+)\)/);ce(`:scope > li:nth-child(${c})`,e)&&(s.set(c,un(c,i)),a.replaceWith(s.get(c)))}return s.size===0?M:H(()=>{let a=new x,c=a.pipe(ee(),oe(!0)),p=[];for(let[l,f]of s)p.push([U(".md-typeset",f),U(`:scope > li:nth-child(${l})`,e)]);return o.pipe(j(c)).subscribe(l=>{e.hidden=!l,e.classList.toggle("md-annotation-list",l);for(let[f,u]of p)l?yn(f,u):yn(u,f)}),L(...[...s].map(([,l])=>xn(l,t,{target$:r}))).pipe(A(()=>a.complete()),de())})}function En(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return En(t)}}function wn(e,t){return H(()=>{let r=En(e);return typeof r!="undefined"?fr(r,e,t):M})}var Tn=jt(Kr());var Oa=0;function Sn(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return Sn(t)}}function Ma(e){return Se(e).pipe(m(({width:t})=>({scrollable:xt(e).width>t})),te("scrollable"))}function On(e,t){let{matches:r}=matchMedia("(hover)"),o=H(()=>{let n=new x,i=n.pipe(Rr(1));n.subscribe(({scrollable:c})=>{c&&r?e.setAttribute("tabindex","0"):e.removeAttribute("tabindex")});let s=[];if(Tn.default.isSupported()&&(e.closest(".copy")||G("content.code.copy")&&!e.closest(".no-copy"))){let c=e.closest("pre");c.id=`__code_${Oa++}`;let p=dn(c.id);c.insertBefore(p,e),G("content.tooltips")&&s.push(Be(p))}let a=e.closest(".highlight");if(a instanceof HTMLElement){let c=Sn(a);if(typeof c!="undefined"&&(a.classList.contains("annotate")||G("content.code.annotate"))){let p=fr(c,e,t);s.push(Se(a).pipe(j(i),m(({width:l,height:f})=>l&&f),X(),w(l=>l?p:M)))}}return Ma(e).pipe(T(c=>n.next(c)),A(()=>n.complete()),m(c=>P({ref:e},c)),Re(...s))});return G("content.lazy")?yt(e).pipe(v(n=>n),ue(1),w(()=>o)):o}function La(e,{target$:t,print$:r}){let o=!0;return L(t.pipe(m(n=>n.closest("details:not([open])")),v(n=>e===n),m(()=>({action:"open",reveal:!0}))),r.pipe(v(n=>n||!o),T(()=>o=e.open),m(n=>({action:n?"open":"close"}))))}function Mn(e,t){return H(()=>{let r=new x;return r.subscribe(({action:o,reveal:n})=>{e.toggleAttribute("open",o==="open"),n&&e.scrollIntoView()}),La(e,t).pipe(T(o=>r.next(o)),A(()=>r.complete()),m(o=>P({ref:e},o)))})}var Ln=".node circle,.node ellipse,.node path,.node polygon,.node rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}marker{fill:var(--md-mermaid-edge-color)!important}.edgeLabel .label rect{fill:#0000}.label{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.label foreignObject{line-height:normal;overflow:visible}.label div .edgeLabel{color:var(--md-mermaid-label-fg-color)}.edgeLabel,.edgeLabel rect,.label div .edgeLabel{background-color:var(--md-mermaid-label-bg-color)}.edgeLabel,.edgeLabel rect{fill:var(--md-mermaid-label-bg-color);color:var(--md-mermaid-edge-color)}.edgePath .path,.flowchart-link{stroke:var(--md-mermaid-edge-color);stroke-width:.05rem}.edgePath .arrowheadPath{fill:var(--md-mermaid-edge-color);stroke:none}.cluster rect{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}.cluster span{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}g #flowchart-circleEnd,g #flowchart-circleStart,g #flowchart-crossEnd,g #flowchart-crossStart,g #flowchart-pointEnd,g #flowchart-pointStart{stroke:none}g.classGroup line,g.classGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.classGroup text{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.classLabel .box{fill:var(--md-mermaid-label-bg-color);background-color:var(--md-mermaid-label-bg-color);opacity:1}.classLabel .label{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.node .divider{stroke:var(--md-mermaid-node-fg-color)}.relation{stroke:var(--md-mermaid-edge-color)}.cardinality{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.cardinality text{fill:inherit!important}defs #classDiagram-compositionEnd,defs #classDiagram-compositionStart,defs #classDiagram-dependencyEnd,defs #classDiagram-dependencyStart,defs #classDiagram-extensionEnd,defs #classDiagram-extensionStart{fill:var(--md-mermaid-edge-color)!important;stroke:var(--md-mermaid-edge-color)!important}defs #classDiagram-aggregationEnd,defs #classDiagram-aggregationStart{fill:var(--md-mermaid-label-bg-color)!important;stroke:var(--md-mermaid-edge-color)!important}g.stateGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.stateGroup .state-title{fill:var(--md-mermaid-label-fg-color)!important;font-family:var(--md-mermaid-font-family)}g.stateGroup .composit{fill:var(--md-mermaid-label-bg-color)}.nodeLabel{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.node circle.state-end,.node circle.state-start,.start-state{fill:var(--md-mermaid-edge-color);stroke:none}.end-state-inner,.end-state-outer{fill:var(--md-mermaid-edge-color)}.end-state-inner,.node circle.state-end{stroke:var(--md-mermaid-label-bg-color)}.transition{stroke:var(--md-mermaid-edge-color)}[id^=state-fork] rect,[id^=state-join] rect{fill:var(--md-mermaid-edge-color)!important;stroke:none!important}.statediagram-cluster.statediagram-cluster .inner{fill:var(--md-default-bg-color)}.statediagram-cluster rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.statediagram-state rect.divider{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}defs #statediagram-barbEnd{stroke:var(--md-mermaid-edge-color)}.attributeBoxEven,.attributeBoxOdd{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityBox{fill:var(--md-mermaid-label-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityLabel{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.relationshipLabelBox{fill:var(--md-mermaid-label-bg-color);fill-opacity:1;background-color:var(--md-mermaid-label-bg-color);opacity:1}.relationshipLabel{fill:var(--md-mermaid-label-fg-color)}.relationshipLine{stroke:var(--md-mermaid-edge-color)}defs #ONE_OR_MORE_END *,defs #ONE_OR_MORE_START *,defs #ONLY_ONE_END *,defs #ONLY_ONE_START *,defs #ZERO_OR_MORE_END *,defs #ZERO_OR_MORE_START *,defs #ZERO_OR_ONE_END *,defs #ZERO_OR_ONE_START *{stroke:var(--md-mermaid-edge-color)!important}defs #ZERO_OR_MORE_END circle,defs #ZERO_OR_MORE_START circle{fill:var(--md-mermaid-label-bg-color)}.actor{fill:var(--md-mermaid-sequence-actor-bg-color);stroke:var(--md-mermaid-sequence-actor-border-color)}text.actor>tspan{fill:var(--md-mermaid-sequence-actor-fg-color);font-family:var(--md-mermaid-font-family)}line{stroke:var(--md-mermaid-sequence-actor-line-color)}.actor-man circle,.actor-man line{fill:var(--md-mermaid-sequence-actorman-bg-color);stroke:var(--md-mermaid-sequence-actorman-line-color)}.messageLine0,.messageLine1{stroke:var(--md-mermaid-sequence-message-line-color)}.note{fill:var(--md-mermaid-sequence-note-bg-color);stroke:var(--md-mermaid-sequence-note-border-color)}.loopText,.loopText>tspan,.messageText,.noteText>tspan{stroke:none;font-family:var(--md-mermaid-font-family)!important}.messageText{fill:var(--md-mermaid-sequence-message-fg-color)}.loopText,.loopText>tspan{fill:var(--md-mermaid-sequence-loop-fg-color)}.noteText>tspan{fill:var(--md-mermaid-sequence-note-fg-color)}#arrowhead path{fill:var(--md-mermaid-sequence-message-line-color);stroke:none}.loopLine{fill:var(--md-mermaid-sequence-loop-bg-color);stroke:var(--md-mermaid-sequence-loop-border-color)}.labelBox{fill:var(--md-mermaid-sequence-label-bg-color);stroke:none}.labelText,.labelText>span{fill:var(--md-mermaid-sequence-label-fg-color);font-family:var(--md-mermaid-font-family)}.sequenceNumber{fill:var(--md-mermaid-sequence-number-fg-color)}rect.rect{fill:var(--md-mermaid-sequence-box-bg-color);stroke:none}rect.rect+text.text{fill:var(--md-mermaid-sequence-box-fg-color)}defs #sequencenumber{fill:var(--md-mermaid-sequence-number-bg-color)!important}";var Qr,Aa=0;function Ca(){return typeof mermaid=="undefined"||mermaid instanceof Element?gt("https://unpkg.com/mermaid@10.6.1/dist/mermaid.min.js"):R(void 0)}function _n(e){return e.classList.remove("mermaid"),Qr||(Qr=Ca().pipe(T(()=>mermaid.initialize({startOnLoad:!1,themeCSS:Ln,sequence:{actorFontSize:"16px",messageFontSize:"16px",noteFontSize:"16px"}})),m(()=>{}),Z(1))),Qr.subscribe(()=>no(this,null,function*(){e.classList.add("mermaid");let t=`__mermaid_${Aa++}`,r=S("div",{class:"mermaid"}),o=e.textContent,{svg:n,fn:i}=yield mermaid.render(t,o),s=r.attachShadow({mode:"closed"});s.innerHTML=n,e.replaceWith(r),i==null||i(s)})),Qr.pipe(m(()=>({ref:e})))}var An=S("table");function Cn(e){return e.replaceWith(An),An.replaceWith(vn(e)),R({ref:e})}function ka(e){let t=e.find(r=>r.checked)||e[0];return L(...e.map(r=>h(r,"change").pipe(m(()=>U(`label[for="${r.id}"]`))))).pipe(q(U(`label[for="${t.id}"]`)),m(r=>({active:r})))}function kn(e,{viewport$:t,target$:r}){let o=U(".tabbed-labels",e),n=W(":scope > input",e),i=zr("prev");e.append(i);let s=zr("next");return e.append(s),H(()=>{let a=new x,c=a.pipe(ee(),oe(!0));B([a,Se(e)]).pipe(j(c),Le(1,ge)).subscribe({next([{active:p},l]){let f=Ue(p),{width:u}=le(p);e.style.setProperty("--md-indicator-x",`${f.x}px`),e.style.setProperty("--md-indicator-width",`${u}px`);let d=ir(o);(f.xd.x+l.width)&&o.scrollTo({left:Math.max(0,f.x-16),behavior:"smooth"})},complete(){e.style.removeProperty("--md-indicator-x"),e.style.removeProperty("--md-indicator-width")}}),B([et(o),Se(o)]).pipe(j(c)).subscribe(([p,l])=>{let f=xt(o);i.hidden=p.x<16,s.hidden=p.x>f.width-l.width-16}),L(h(i,"click").pipe(m(()=>-1)),h(s,"click").pipe(m(()=>1))).pipe(j(c)).subscribe(p=>{let{width:l}=le(o);o.scrollBy({left:l*p,behavior:"smooth"})}),r.pipe(j(c),v(p=>n.includes(p))).subscribe(p=>p.click()),o.classList.add("tabbed-labels--linked");for(let p of n){let l=U(`label[for="${p.id}"]`);l.replaceChildren(S("a",{href:`#${l.htmlFor}`,tabIndex:-1},...Array.from(l.childNodes))),h(l.firstElementChild,"click").pipe(j(c),v(f=>!(f.metaKey||f.ctrlKey)),T(f=>{f.preventDefault(),f.stopPropagation()})).subscribe(()=>{history.replaceState({},"",`#${l.htmlFor}`),l.click()})}return G("content.tabs.link")&&a.pipe(Ee(1),ae(t)).subscribe(([{active:p},{offset:l}])=>{let f=p.innerText.trim();if(p.hasAttribute("data-md-switching"))p.removeAttribute("data-md-switching");else{let u=e.offsetTop-l.y;for(let y of W("[data-tabs]"))for(let b of W(":scope > input",y)){let D=U(`label[for="${b.id}"]`);if(D!==p&&D.innerText.trim()===f){D.setAttribute("data-md-switching",""),b.click();break}}window.scrollTo({top:e.offsetTop-u});let d=__md_get("__tabs")||[];__md_set("__tabs",[...new Set([f,...d])])}}),a.pipe(j(c)).subscribe(()=>{for(let p of W("audio, video",e))p.pause()}),ka(n).pipe(T(p=>a.next(p)),A(()=>a.complete()),m(p=>P({ref:e},p)))}).pipe(qe(ie))}function Hn(e,{viewport$:t,target$:r,print$:o}){return L(...W(".annotate:not(.highlight)",e).map(n=>wn(n,{target$:r,print$:o})),...W("pre:not(.mermaid) > code",e).map(n=>On(n,{target$:r,print$:o})),...W("pre.mermaid",e).map(n=>_n(n)),...W("table:not([class])",e).map(n=>Cn(n)),...W("details",e).map(n=>Mn(n,{target$:r,print$:o})),...W("[data-tabs]",e).map(n=>kn(n,{viewport$:t,target$:r})),...W("[title]",e).filter(()=>G("content.tooltips")).map(n=>Be(n)))}function Ha(e,{alert$:t}){return t.pipe(w(r=>L(R(!0),R(!1).pipe(Qe(2e3))).pipe(m(o=>({message:r,active:o})))))}function $n(e,t){let r=U(".md-typeset",e);return H(()=>{let o=new x;return o.subscribe(({message:n,active:i})=>{e.classList.toggle("md-dialog--active",i),r.textContent=n}),Ha(e,t).pipe(T(n=>o.next(n)),A(()=>o.complete()),m(n=>P({ref:e},n)))})}function $a({viewport$:e}){if(!G("header.autohide"))return R(!1);let t=e.pipe(m(({offset:{y:n}})=>n),Ce(2,1),m(([n,i])=>[nMath.abs(i-n.y)>100),m(([,[n]])=>n),X()),o=Ne("search");return B([e,o]).pipe(m(([{offset:n},i])=>n.y>400&&!i),X(),w(n=>n?r:R(!1)),q(!1))}function Pn(e,t){return H(()=>B([Se(e),$a(t)])).pipe(m(([{height:r},o])=>({height:r,hidden:o})),X((r,o)=>r.height===o.height&&r.hidden===o.hidden),Z(1))}function Rn(e,{header$:t,main$:r}){return H(()=>{let o=new x,n=o.pipe(ee(),oe(!0));o.pipe(te("active"),Ze(t)).subscribe(([{active:s},{hidden:a}])=>{e.classList.toggle("md-header--shadow",s&&!a),e.hidden=a});let i=fe(W("[title]",e)).pipe(v(()=>G("content.tooltips")),re(s=>Be(s)));return r.subscribe(o),t.pipe(j(n),m(s=>P({ref:e},s)),Re(i.pipe(j(n))))})}function Pa(e,{viewport$:t,header$:r}){return mr(e,{viewport$:t,header$:r}).pipe(m(({offset:{y:o}})=>{let{height:n}=le(e);return{active:o>=n}}),te("active"))}function In(e,t){return H(()=>{let r=new x;r.subscribe({next({active:n}){e.classList.toggle("md-header__title--active",n)},complete(){e.classList.remove("md-header__title--active")}});let o=ce(".md-content h1");return typeof o=="undefined"?M:Pa(o,t).pipe(T(n=>r.next(n)),A(()=>r.complete()),m(n=>P({ref:e},n)))})}function Fn(e,{viewport$:t,header$:r}){let o=r.pipe(m(({height:i})=>i),X()),n=o.pipe(w(()=>Se(e).pipe(m(({height:i})=>({top:e.offsetTop,bottom:e.offsetTop+i})),te("bottom"))));return B([o,n,t]).pipe(m(([i,{top:s,bottom:a},{offset:{y:c},size:{height:p}}])=>(p=Math.max(0,p-Math.max(0,s-c,i)-Math.max(0,p+c-a)),{offset:s-i,height:p,active:s-i<=c})),X((i,s)=>i.offset===s.offset&&i.height===s.height&&i.active===s.active))}function Ra(e){let t=__md_get("__palette")||{index:e.findIndex(o=>matchMedia(o.getAttribute("data-md-color-media")).matches)},r=Math.max(0,Math.min(t.index,e.length-1));return R(...e).pipe(re(o=>h(o,"change").pipe(m(()=>o))),q(e[r]),m(o=>({index:e.indexOf(o),color:{media:o.getAttribute("data-md-color-media"),scheme:o.getAttribute("data-md-color-scheme"),primary:o.getAttribute("data-md-color-primary"),accent:o.getAttribute("data-md-color-accent")}})),Z(1))}function jn(e){let t=W("input",e),r=S("meta",{name:"theme-color"});document.head.appendChild(r);let o=S("meta",{name:"color-scheme"});document.head.appendChild(o);let n=At("(prefers-color-scheme: light)");return H(()=>{let i=new x;return i.subscribe(s=>{if(document.body.setAttribute("data-md-color-switching",""),s.color.media==="(prefers-color-scheme)"){let a=matchMedia("(prefers-color-scheme: light)"),c=document.querySelector(a.matches?"[data-md-color-media='(prefers-color-scheme: light)']":"[data-md-color-media='(prefers-color-scheme: dark)']");s.color.scheme=c.getAttribute("data-md-color-scheme"),s.color.primary=c.getAttribute("data-md-color-primary"),s.color.accent=c.getAttribute("data-md-color-accent")}for(let[a,c]of Object.entries(s.color))document.body.setAttribute(`data-md-color-${a}`,c);for(let a=0;a{let s=Oe("header"),a=window.getComputedStyle(s);return o.content=a.colorScheme,a.backgroundColor.match(/\d+/g).map(c=>(+c).toString(16).padStart(2,"0")).join("")})).subscribe(s=>r.content=`#${s}`),i.pipe(Me(ie)).subscribe(()=>{document.body.removeAttribute("data-md-color-switching")}),Ra(t).pipe(j(n.pipe(Ee(1))),at(),T(s=>i.next(s)),A(()=>i.complete()),m(s=>P({ref:e},s)))})}function Wn(e,{progress$:t}){return H(()=>{let r=new x;return r.subscribe(({value:o})=>{e.style.setProperty("--md-progress-value",`${o}`)}),t.pipe(T(o=>r.next({value:o})),A(()=>r.complete()),m(o=>({ref:e,value:o})))})}var Yr=jt(Kr());function Ia(e){e.setAttribute("data-md-copying","");let t=e.closest("[data-copy]"),r=t?t.getAttribute("data-copy"):e.innerText;return e.removeAttribute("data-md-copying"),r.trimEnd()}function Un({alert$:e}){Yr.default.isSupported()&&new I(t=>{new Yr.default("[data-clipboard-target], [data-clipboard-text]",{text:r=>r.getAttribute("data-clipboard-text")||Ia(U(r.getAttribute("data-clipboard-target")))}).on("success",r=>t.next(r))}).pipe(T(t=>{t.trigger.focus()}),m(()=>we("clipboard.copied"))).subscribe(e)}function Fa(e){if(e.length<2)return[""];let[t,r]=[...e].sort((n,i)=>n.length-i.length).map(n=>n.replace(/[^/]+$/,"")),o=0;if(t===r)o=t.length;else for(;t.charCodeAt(o)===r.charCodeAt(o);)o++;return e.map(n=>n.replace(t.slice(0,o),""))}function ur(e){let t=__md_get("__sitemap",sessionStorage,e);if(t)return R(t);{let r=he();return on(new URL("sitemap.xml",e||r.base)).pipe(m(o=>Fa(W("loc",o).map(n=>n.textContent))),xe(()=>M),$e([]),T(o=>__md_set("__sitemap",o,sessionStorage,e)))}}function Nn(e){let t=ce("[rel=canonical]",e);typeof t!="undefined"&&(t.href=t.href.replace("//localhost:","//127.0.0.1:"));let r=new Map;for(let o of W(":scope > *",e)){let n=o.outerHTML;for(let i of["href","src"]){let s=o.getAttribute(i);if(s===null)continue;let a=new URL(s,t==null?void 0:t.href),c=o.cloneNode();c.setAttribute(i,`${a}`),n=c.outerHTML;break}r.set(n,o)}return r}function Dn({location$:e,viewport$:t,progress$:r}){let o=he();if(location.protocol==="file:")return M;let n=ur().pipe(m(l=>l.map(f=>`${new URL(f,o.base)}`))),i=h(document.body,"click").pipe(ae(n),w(([l,f])=>{if(!(l.target instanceof Element))return M;let u=l.target.closest("a");if(u===null)return M;if(u.target||l.metaKey||l.ctrlKey)return M;let d=new URL(u.href);return d.search=d.hash="",f.includes(`${d}`)?(l.preventDefault(),R(new URL(u.href))):M}),de());i.pipe(ue(1)).subscribe(()=>{let l=ce("link[rel=icon]");typeof l!="undefined"&&(l.href=l.href)}),h(window,"beforeunload").subscribe(()=>{history.scrollRestoration="auto"}),i.pipe(ae(t)).subscribe(([l,{offset:f}])=>{history.scrollRestoration="manual",history.replaceState(f,""),history.pushState(null,"",l)}),i.subscribe(e);let s=e.pipe(q(me()),te("pathname"),Ee(1),w(l=>lr(l,{progress$:r}).pipe(xe(()=>(st(l,!0),M))))),a=new DOMParser,c=s.pipe(w(l=>l.text()),w(l=>{let f=a.parseFromString(l,"text/html");for(let b of["[data-md-component=announce]","[data-md-component=container]","[data-md-component=header-topic]","[data-md-component=outdated]","[data-md-component=logo]","[data-md-component=skip]",...G("navigation.tabs.sticky")?["[data-md-component=tabs]"]:[]]){let D=ce(b),Q=ce(b,f);typeof D!="undefined"&&typeof Q!="undefined"&&D.replaceWith(Q)}let u=Nn(document.head),d=Nn(f.head);for(let[b,D]of d)D.getAttribute("rel")==="stylesheet"||D.hasAttribute("src")||(u.has(b)?u.delete(b):document.head.appendChild(D));for(let b of u.values())b.getAttribute("rel")==="stylesheet"||b.hasAttribute("src")||b.remove();let y=Oe("container");return We(W("script",y)).pipe(w(b=>{let D=f.createElement("script");if(b.src){for(let Q of b.getAttributeNames())D.setAttribute(Q,b.getAttribute(Q));return b.replaceWith(D),new I(Q=>{D.onload=()=>Q.complete()})}else return D.textContent=b.textContent,b.replaceWith(D),M}),ee(),oe(f))}),de());return h(window,"popstate").pipe(m(me)).subscribe(e),e.pipe(q(me()),Ce(2,1),v(([l,f])=>l.pathname===f.pathname&&l.hash!==f.hash),m(([,l])=>l)).subscribe(l=>{var f,u;history.state!==null||!l.hash?window.scrollTo(0,(u=(f=history.state)==null?void 0:f.y)!=null?u:0):(history.scrollRestoration="auto",pr(l.hash),history.scrollRestoration="manual")}),e.pipe(Ir(i),q(me()),Ce(2,1),v(([l,f])=>l.pathname===f.pathname&&l.hash===f.hash),m(([,l])=>l)).subscribe(l=>{history.scrollRestoration="auto",pr(l.hash),history.scrollRestoration="manual",history.back()}),c.pipe(ae(e)).subscribe(([,l])=>{var f,u;history.state!==null||!l.hash?window.scrollTo(0,(u=(f=history.state)==null?void 0:f.y)!=null?u:0):pr(l.hash)}),t.pipe(te("offset"),ye(100)).subscribe(({offset:l})=>{history.replaceState(l,"")}),c}var qn=jt(zn());function Kn(e){let t=e.separator.split("|").map(n=>n.replace(/(\(\?[!=<][^)]+\))/g,"").length===0?"\uFFFD":n).join("|"),r=new RegExp(t,"img"),o=(n,i,s)=>`${i}${s}`;return n=>{n=n.replace(/[\s*+\-:~^]+/g," ").trim();let i=new RegExp(`(^|${e.separator}|)(${n.replace(/[|\\{}()[\]^$+*?.-]/g,"\\$&").replace(r,"|")})`,"img");return s=>(0,qn.default)(s).replace(i,o).replace(/<\/mark>(\s+)]*>/img,"$1")}}function Ht(e){return e.type===1}function dr(e){return e.type===3}function Qn(e,t){let r=ln(e);return L(R(location.protocol!=="file:"),Ne("search")).pipe(Pe(o=>o),w(()=>t)).subscribe(({config:o,docs:n})=>r.next({type:0,data:{config:o,docs:n,options:{suggest:G("search.suggest")}}})),r}function Yn({document$:e}){let t=he(),r=De(new URL("../versions.json",t.base)).pipe(xe(()=>M)),o=r.pipe(m(n=>{let[,i]=t.base.match(/([^/]+)\/?$/);return n.find(({version:s,aliases:a})=>s===i||a.includes(i))||n[0]}));r.pipe(m(n=>new Map(n.map(i=>[`${new URL(`../${i.version}/`,t.base)}`,i]))),w(n=>h(document.body,"click").pipe(v(i=>!i.metaKey&&!i.ctrlKey),ae(o),w(([i,s])=>{if(i.target instanceof Element){let a=i.target.closest("a");if(a&&!a.target&&n.has(a.href)){let c=a.href;return!i.target.closest(".md-version")&&n.get(c)===s?M:(i.preventDefault(),R(c))}}return M}),w(i=>{let{version:s}=n.get(i);return ur(new URL(i)).pipe(m(a=>{let p=me().href.replace(t.base,"");return a.includes(p.split("#")[0])?new URL(`../${s}/${p}`,t.base):new URL(i)}))})))).subscribe(n=>st(n,!0)),B([r,o]).subscribe(([n,i])=>{U(".md-header__topic").appendChild(gn(n,i))}),e.pipe(w(()=>o)).subscribe(n=>{var s;let i=__md_get("__outdated",sessionStorage);if(i===null){i=!0;let a=((s=t.version)==null?void 0:s.default)||"latest";Array.isArray(a)||(a=[a]);e:for(let c of a)for(let p of n.aliases.concat(n.version))if(new RegExp(c,"i").test(p)){i=!1;break e}__md_set("__outdated",i,sessionStorage)}if(i)for(let a of ne("outdated"))a.hidden=!1})}function Da(e,{worker$:t}){let{searchParams:r}=me();r.has("q")&&(Ye("search",!0),e.value=r.get("q"),e.focus(),Ne("search").pipe(Pe(i=>!i)).subscribe(()=>{let i=me();i.searchParams.delete("q"),history.replaceState({},"",`${i}`)}));let o=vt(e),n=L(t.pipe(Pe(Ht)),h(e,"keyup"),o).pipe(m(()=>e.value),X());return B([n,o]).pipe(m(([i,s])=>({value:i,focus:s})),Z(1))}function Bn(e,{worker$:t}){let r=new x,o=r.pipe(ee(),oe(!0));B([t.pipe(Pe(Ht)),r],(i,s)=>s).pipe(te("value")).subscribe(({value:i})=>t.next({type:2,data:i})),r.pipe(te("focus")).subscribe(({focus:i})=>{i&&Ye("search",i)}),h(e.form,"reset").pipe(j(o)).subscribe(()=>e.focus());let n=U("header [for=__search]");return h(n,"click").subscribe(()=>e.focus()),Da(e,{worker$:t}).pipe(T(i=>r.next(i)),A(()=>r.complete()),m(i=>P({ref:e},i)),Z(1))}function Gn(e,{worker$:t,query$:r}){let o=new x,n=Go(e.parentElement).pipe(v(Boolean)),i=e.parentElement,s=U(":scope > :first-child",e),a=U(":scope > :last-child",e);Ne("search").subscribe(l=>a.setAttribute("role",l?"list":"presentation")),o.pipe(ae(r),Wr(t.pipe(Pe(Ht)))).subscribe(([{items:l},{value:f}])=>{switch(l.length){case 0:s.textContent=f.length?we("search.result.none"):we("search.result.placeholder");break;case 1:s.textContent=we("search.result.one");break;default:let u=ar(l.length);s.textContent=we("search.result.other",u)}});let c=o.pipe(T(()=>a.innerHTML=""),w(({items:l})=>L(R(...l.slice(0,10)),R(...l.slice(10)).pipe(Ce(4),Nr(n),w(([f])=>f)))),m(hn),de());return c.subscribe(l=>a.appendChild(l)),c.pipe(re(l=>{let f=ce("details",l);return typeof f=="undefined"?M:h(f,"toggle").pipe(j(o),m(()=>f))})).subscribe(l=>{l.open===!1&&l.offsetTop<=i.scrollTop&&i.scrollTo({top:l.offsetTop})}),t.pipe(v(dr),m(({data:l})=>l)).pipe(T(l=>o.next(l)),A(()=>o.complete()),m(l=>P({ref:e},l)))}function Va(e,{query$:t}){return t.pipe(m(({value:r})=>{let o=me();return o.hash="",r=r.replace(/\s+/g,"+").replace(/&/g,"%26").replace(/=/g,"%3D"),o.search=`q=${r}`,{url:o}}))}function Jn(e,t){let r=new x,o=r.pipe(ee(),oe(!0));return r.subscribe(({url:n})=>{e.setAttribute("data-clipboard-text",e.href),e.href=`${n}`}),h(e,"click").pipe(j(o)).subscribe(n=>n.preventDefault()),Va(e,t).pipe(T(n=>r.next(n)),A(()=>r.complete()),m(n=>P({ref:e},n)))}function Xn(e,{worker$:t,keyboard$:r}){let o=new x,n=Oe("search-query"),i=L(h(n,"keydown"),h(n,"focus")).pipe(Me(ie),m(()=>n.value),X());return o.pipe(Ze(i),m(([{suggest:a},c])=>{let p=c.split(/([\s-]+)/);if(a!=null&&a.length&&p[p.length-1]){let l=a[a.length-1];l.startsWith(p[p.length-1])&&(p[p.length-1]=l)}else p.length=0;return p})).subscribe(a=>e.innerHTML=a.join("").replace(/\s/g," ")),r.pipe(v(({mode:a})=>a==="search")).subscribe(a=>{switch(a.type){case"ArrowRight":e.innerText.length&&n.selectionStart===n.value.length&&(n.value=e.innerText);break}}),t.pipe(v(dr),m(({data:a})=>a)).pipe(T(a=>o.next(a)),A(()=>o.complete()),m(()=>({ref:e})))}function Zn(e,{index$:t,keyboard$:r}){let o=he();try{let n=Qn(o.search,t),i=Oe("search-query",e),s=Oe("search-result",e);h(e,"click").pipe(v(({target:c})=>c instanceof Element&&!!c.closest("a"))).subscribe(()=>Ye("search",!1)),r.pipe(v(({mode:c})=>c==="search")).subscribe(c=>{let p=Ie();switch(c.type){case"Enter":if(p===i){let l=new Map;for(let f of W(":first-child [href]",s)){let u=f.firstElementChild;l.set(f,parseFloat(u.getAttribute("data-md-score")))}if(l.size){let[[f]]=[...l].sort(([,u],[,d])=>d-u);f.click()}c.claim()}break;case"Escape":case"Tab":Ye("search",!1),i.blur();break;case"ArrowUp":case"ArrowDown":if(typeof p=="undefined")i.focus();else{let l=[i,...W(":not(details) > [href], summary, details[open] [href]",s)],f=Math.max(0,(Math.max(0,l.indexOf(p))+l.length+(c.type==="ArrowUp"?-1:1))%l.length);l[f].focus()}c.claim();break;default:i!==Ie()&&i.focus()}}),r.pipe(v(({mode:c})=>c==="global")).subscribe(c=>{switch(c.type){case"f":case"s":case"/":i.focus(),i.select(),c.claim();break}});let a=Bn(i,{worker$:n});return L(a,Gn(s,{worker$:n,query$:a})).pipe(Re(...ne("search-share",e).map(c=>Jn(c,{query$:a})),...ne("search-suggest",e).map(c=>Xn(c,{worker$:n,keyboard$:r}))))}catch(n){return e.hidden=!0,Ke}}function ei(e,{index$:t,location$:r}){return B([t,r.pipe(q(me()),v(o=>!!o.searchParams.get("h")))]).pipe(m(([o,n])=>Kn(o.config)(n.searchParams.get("h"))),m(o=>{var s;let n=new Map,i=document.createNodeIterator(e,NodeFilter.SHOW_TEXT);for(let a=i.nextNode();a;a=i.nextNode())if((s=a.parentElement)!=null&&s.offsetHeight){let c=a.textContent,p=o(c);p.length>c.length&&n.set(a,p)}for(let[a,c]of n){let{childNodes:p}=S("span",null,c);a.replaceWith(...Array.from(p))}return{ref:e,nodes:n}}))}function za(e,{viewport$:t,main$:r}){let o=e.closest(".md-grid"),n=o.offsetTop-o.parentElement.offsetTop;return B([r,t]).pipe(m(([{offset:i,height:s},{offset:{y:a}}])=>(s=s+Math.min(n,Math.max(0,a-i))-n,{height:s,locked:a>=i+n})),X((i,s)=>i.height===s.height&&i.locked===s.locked))}function Br(e,o){var n=o,{header$:t}=n,r=oo(n,["header$"]);let i=U(".md-sidebar__scrollwrap",e),{y:s}=Ue(i);return H(()=>{let a=new x,c=a.pipe(ee(),oe(!0)),p=a.pipe(Le(0,ge));return p.pipe(ae(t)).subscribe({next([{height:l},{height:f}]){i.style.height=`${l-2*s}px`,e.style.top=`${f}px`},complete(){i.style.height="",e.style.top=""}}),p.pipe(Pe()).subscribe(()=>{for(let l of W(".md-nav__link--active[href]",e)){if(!l.clientHeight)continue;let f=l.closest(".md-sidebar__scrollwrap");if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:d}=le(f);f.scrollTo({top:u-d/2})}}}),fe(W("label[tabindex]",e)).pipe(re(l=>h(l,"click").pipe(Me(ie),m(()=>l),j(c)))).subscribe(l=>{let f=U(`[id="${l.htmlFor}"]`);U(`[aria-labelledby="${l.id}"]`).setAttribute("aria-expanded",`${f.checked}`)}),za(e,r).pipe(T(l=>a.next(l)),A(()=>a.complete()),m(l=>P({ref:e},l)))})}function ti(e,t){if(typeof t!="undefined"){let r=`https://api.github.com/repos/${e}/${t}`;return Lt(De(`${r}/releases/latest`).pipe(xe(()=>M),m(o=>({version:o.tag_name})),$e({})),De(r).pipe(xe(()=>M),m(o=>({stars:o.stargazers_count,forks:o.forks_count})),$e({}))).pipe(m(([o,n])=>P(P({},o),n)))}else{let r=`https://api.github.com/users/${e}`;return De(r).pipe(m(o=>({repositories:o.public_repos})),$e({}))}}function ri(e,t){let r=`https://${e}/api/v4/projects/${encodeURIComponent(t)}`;return De(r).pipe(xe(()=>M),m(({star_count:o,forks_count:n})=>({stars:o,forks:n})),$e({}))}function oi(e){let t=e.match(/^.+github\.com\/([^/]+)\/?([^/]+)?/i);if(t){let[,r,o]=t;return ti(r,o)}if(t=e.match(/^.+?([^/]*gitlab[^/]+)\/(.+?)\/?$/i),t){let[,r,o]=t;return ri(r,o)}return M}var qa;function Ka(e){return qa||(qa=H(()=>{let t=__md_get("__source",sessionStorage);if(t)return R(t);if(ne("consent").length){let o=__md_get("__consent");if(!(o&&o.github))return M}return oi(e.href).pipe(T(o=>__md_set("__source",o,sessionStorage)))}).pipe(xe(()=>M),v(t=>Object.keys(t).length>0),m(t=>({facts:t})),Z(1)))}function ni(e){let t=U(":scope > :last-child",e);return H(()=>{let r=new x;return r.subscribe(({facts:o})=>{t.appendChild(bn(o)),t.classList.add("md-source__repository--active")}),Ka(e).pipe(T(o=>r.next(o)),A(()=>r.complete()),m(o=>P({ref:e},o)))})}function Qa(e,{viewport$:t,header$:r}){return Se(document.body).pipe(w(()=>mr(e,{header$:r,viewport$:t})),m(({offset:{y:o}})=>({hidden:o>=10})),te("hidden"))}function ii(e,t){return H(()=>{let r=new x;return r.subscribe({next({hidden:o}){e.hidden=o},complete(){e.hidden=!1}}),(G("navigation.tabs.sticky")?R({hidden:!1}):Qa(e,t)).pipe(T(o=>r.next(o)),A(()=>r.complete()),m(o=>P({ref:e},o)))})}function Ya(e,{viewport$:t,header$:r}){let o=new Map,n=W("[href^=\\#]",e);for(let a of n){let c=decodeURIComponent(a.hash.substring(1)),p=ce(`[id="${c}"]`);typeof p!="undefined"&&o.set(a,p)}let i=r.pipe(te("height"),m(({height:a})=>{let c=Oe("main"),p=U(":scope > :first-child",c);return a+.8*(p.offsetTop-c.offsetTop)}),de());return Se(document.body).pipe(te("height"),w(a=>H(()=>{let c=[];return R([...o].reduce((p,[l,f])=>{for(;c.length&&o.get(c[c.length-1]).tagName>=f.tagName;)c.pop();let u=f.offsetTop;for(;!u&&f.parentElement;)f=f.parentElement,u=f.offsetTop;let d=f.offsetParent;for(;d;d=d.offsetParent)u+=d.offsetTop;return p.set([...c=[...c,l]].reverse(),u)},new Map))}).pipe(m(c=>new Map([...c].sort(([,p],[,l])=>p-l))),Ze(i),w(([c,p])=>t.pipe(Fr(([l,f],{offset:{y:u},size:d})=>{let y=u+d.height>=Math.floor(a.height);for(;f.length;){let[,b]=f[0];if(b-p=u&&!y)f=[l.pop(),...f];else break}return[l,f]},[[],[...c]]),X((l,f)=>l[0]===f[0]&&l[1]===f[1])))))).pipe(m(([a,c])=>({prev:a.map(([p])=>p),next:c.map(([p])=>p)})),q({prev:[],next:[]}),Ce(2,1),m(([a,c])=>a.prev.length{let i=new x,s=i.pipe(ee(),oe(!0));if(i.subscribe(({prev:a,next:c})=>{for(let[p]of c)p.classList.remove("md-nav__link--passed"),p.classList.remove("md-nav__link--active");for(let[p,[l]]of a.entries())l.classList.add("md-nav__link--passed"),l.classList.toggle("md-nav__link--active",p===a.length-1)}),G("toc.follow")){let a=L(t.pipe(ye(1),m(()=>{})),t.pipe(ye(250),m(()=>"smooth")));i.pipe(v(({prev:c})=>c.length>0),Ze(o.pipe(Me(ie))),ae(a)).subscribe(([[{prev:c}],p])=>{let[l]=c[c.length-1];if(l.offsetHeight){let f=sr(l);if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:d}=le(f);f.scrollTo({top:u-d/2,behavior:p})}}})}return G("navigation.tracking")&&t.pipe(j(s),te("offset"),ye(250),Ee(1),j(n.pipe(Ee(1))),at({delay:250}),ae(i)).subscribe(([,{prev:a}])=>{let c=me(),p=a[a.length-1];if(p&&p.length){let[l]=p,{hash:f}=new URL(l.href);c.hash!==f&&(c.hash=f,history.replaceState({},"",`${c}`))}else c.hash="",history.replaceState({},"",`${c}`)}),Ya(e,{viewport$:t,header$:r}).pipe(T(a=>i.next(a)),A(()=>i.complete()),m(a=>P({ref:e},a)))})}function Ba(e,{viewport$:t,main$:r,target$:o}){let n=t.pipe(m(({offset:{y:s}})=>s),Ce(2,1),m(([s,a])=>s>a&&a>0),X()),i=r.pipe(m(({active:s})=>s));return B([i,n]).pipe(m(([s,a])=>!(s&&a)),X(),j(o.pipe(Ee(1))),oe(!0),at({delay:250}),m(s=>({hidden:s})))}function si(e,{viewport$:t,header$:r,main$:o,target$:n}){let i=new x,s=i.pipe(ee(),oe(!0));return i.subscribe({next({hidden:a}){e.hidden=a,a?(e.setAttribute("tabindex","-1"),e.blur()):e.removeAttribute("tabindex")},complete(){e.style.top="",e.hidden=!0,e.removeAttribute("tabindex")}}),r.pipe(j(s),te("height")).subscribe(({height:a})=>{e.style.top=`${a+16}px`}),h(e,"click").subscribe(a=>{a.preventDefault(),window.scrollTo({top:0})}),Ba(e,{viewport$:t,main$:o,target$:n}).pipe(T(a=>i.next(a)),A(()=>i.complete()),m(a=>P({ref:e},a)))}function ci({document$:e}){e.pipe(w(()=>W(".md-ellipsis")),re(t=>yt(t).pipe(j(e.pipe(Ee(1))),v(r=>r),m(()=>t),ue(1))),v(t=>t.offsetWidth{let r=t.innerText,o=t.closest("a")||t;return o.title=r,Be(o).pipe(j(e.pipe(Ee(1))),A(()=>o.removeAttribute("title")))})).subscribe(),e.pipe(w(()=>W(".md-status")),re(t=>Be(t))).subscribe()}function pi({document$:e,tablet$:t}){e.pipe(w(()=>W(".md-toggle--indeterminate")),T(r=>{r.indeterminate=!0,r.checked=!1}),re(r=>h(r,"change").pipe(Ur(()=>r.classList.contains("md-toggle--indeterminate")),m(()=>r))),ae(t)).subscribe(([r,o])=>{r.classList.remove("md-toggle--indeterminate"),o&&(r.checked=!1)})}function Ga(){return/(iPad|iPhone|iPod)/.test(navigator.userAgent)}function li({document$:e}){e.pipe(w(()=>W("[data-md-scrollfix]")),T(t=>t.removeAttribute("data-md-scrollfix")),v(Ga),re(t=>h(t,"touchstart").pipe(m(()=>t)))).subscribe(t=>{let r=t.scrollTop;r===0?t.scrollTop=1:r+t.offsetHeight===t.scrollHeight&&(t.scrollTop=r-1)})}function mi({viewport$:e,tablet$:t}){B([Ne("search"),t]).pipe(m(([r,o])=>r&&!o),w(r=>R(r).pipe(Qe(r?400:100))),ae(e)).subscribe(([r,{offset:{y:o}}])=>{if(r)document.body.setAttribute("data-md-scrolllock",""),document.body.style.top=`-${o}px`;else{let n=-1*parseInt(document.body.style.top,10);document.body.removeAttribute("data-md-scrolllock"),document.body.style.top="",n&&window.scrollTo(0,n)}})}Object.entries||(Object.entries=function(e){let t=[];for(let r of Object.keys(e))t.push([r,e[r]]);return t});Object.values||(Object.values=function(e){let t=[];for(let r of Object.keys(e))t.push(e[r]);return t});typeof Element!="undefined"&&(Element.prototype.scrollTo||(Element.prototype.scrollTo=function(e,t){typeof e=="object"?(this.scrollLeft=e.left,this.scrollTop=e.top):(this.scrollLeft=e,this.scrollTop=t)}),Element.prototype.replaceWith||(Element.prototype.replaceWith=function(...e){let t=this.parentNode;if(t){e.length===0&&t.removeChild(this);for(let r=e.length-1;r>=0;r--){let o=e[r];typeof o=="string"?o=document.createTextNode(o):o.parentNode&&o.parentNode.removeChild(o),r?t.insertBefore(this.previousSibling,o):t.replaceChild(o,this)}}}));function Ja(){return location.protocol==="file:"?gt(`${new URL("search/search_index.js",Gr.base)}`).pipe(m(()=>__index),Z(1)):De(new URL("search/search_index.json",Gr.base))}document.documentElement.classList.remove("no-js");document.documentElement.classList.add("js");var rt=zo(),Pt=Zo(),wt=tn(Pt),Jr=Xo(),_e=pn(),hr=At("(min-width: 960px)"),ui=At("(min-width: 1220px)"),di=rn(),Gr=he(),hi=document.forms.namedItem("search")?Ja():Ke,Xr=new x;Un({alert$:Xr});var Zr=new x;G("navigation.instant")&&Dn({location$:Pt,viewport$:_e,progress$:Zr}).subscribe(rt);var fi;((fi=Gr.version)==null?void 0:fi.provider)==="mike"&&Yn({document$:rt});L(Pt,wt).pipe(Qe(125)).subscribe(()=>{Ye("drawer",!1),Ye("search",!1)});Jr.pipe(v(({mode:e})=>e==="global")).subscribe(e=>{switch(e.type){case"p":case",":let t=ce("link[rel=prev]");typeof t!="undefined"&&st(t);break;case"n":case".":let r=ce("link[rel=next]");typeof r!="undefined"&&st(r);break;case"Enter":let o=Ie();o instanceof HTMLLabelElement&&o.click()}});ci({document$:rt});pi({document$:rt,tablet$:hr});li({document$:rt});mi({viewport$:_e,tablet$:hr});var tt=Pn(Oe("header"),{viewport$:_e}),$t=rt.pipe(m(()=>Oe("main")),w(e=>Fn(e,{viewport$:_e,header$:tt})),Z(1)),Xa=L(...ne("consent").map(e=>fn(e,{target$:wt})),...ne("dialog").map(e=>$n(e,{alert$:Xr})),...ne("header").map(e=>Rn(e,{viewport$:_e,header$:tt,main$:$t})),...ne("palette").map(e=>jn(e)),...ne("progress").map(e=>Wn(e,{progress$:Zr})),...ne("search").map(e=>Zn(e,{index$:hi,keyboard$:Jr})),...ne("source").map(e=>ni(e))),Za=H(()=>L(...ne("announce").map(e=>mn(e)),...ne("content").map(e=>Hn(e,{viewport$:_e,target$:wt,print$:di})),...ne("content").map(e=>G("search.highlight")?ei(e,{index$:hi,location$:Pt}):M),...ne("header-title").map(e=>In(e,{viewport$:_e,header$:tt})),...ne("sidebar").map(e=>e.getAttribute("data-md-type")==="navigation"?Dr(ui,()=>Br(e,{viewport$:_e,header$:tt,main$:$t})):Dr(hr,()=>Br(e,{viewport$:_e,header$:tt,main$:$t}))),...ne("tabs").map(e=>ii(e,{viewport$:_e,header$:tt})),...ne("toc").map(e=>ai(e,{viewport$:_e,header$:tt,main$:$t,target$:wt})),...ne("top").map(e=>si(e,{viewport$:_e,header$:tt,main$:$t,target$:wt})))),bi=rt.pipe(w(()=>Za),Re(Xa),Z(1));bi.subscribe();window.document$=rt;window.location$=Pt;window.target$=wt;window.keyboard$=Jr;window.viewport$=_e;window.tablet$=hr;window.screen$=ui;window.print$=di;window.alert$=Xr;window.progress$=Zr;window.component$=bi;})(); +//# sourceMappingURL=bundle.7389ff0e.min.js.map + diff --git a/assets/javascripts/bundle.7389ff0e.min.js.map b/assets/javascripts/bundle.7389ff0e.min.js.map new file mode 100644 index 0000000000..dbee324c23 --- /dev/null +++ b/assets/javascripts/bundle.7389ff0e.min.js.map @@ -0,0 +1,7 @@ +{ + "version": 3, + "sources": ["node_modules/focus-visible/dist/focus-visible.js", "node_modules/clipboard/dist/clipboard.js", "node_modules/escape-html/index.js", "src/templates/assets/javascripts/bundle.ts", "node_modules/rxjs/node_modules/tslib/tslib.es6.js", "node_modules/rxjs/src/internal/util/isFunction.ts", "node_modules/rxjs/src/internal/util/createErrorClass.ts", "node_modules/rxjs/src/internal/util/UnsubscriptionError.ts", "node_modules/rxjs/src/internal/util/arrRemove.ts", "node_modules/rxjs/src/internal/Subscription.ts", "node_modules/rxjs/src/internal/config.ts", "node_modules/rxjs/src/internal/scheduler/timeoutProvider.ts", "node_modules/rxjs/src/internal/util/reportUnhandledError.ts", "node_modules/rxjs/src/internal/util/noop.ts", "node_modules/rxjs/src/internal/NotificationFactories.ts", "node_modules/rxjs/src/internal/util/errorContext.ts", "node_modules/rxjs/src/internal/Subscriber.ts", "node_modules/rxjs/src/internal/symbol/observable.ts", "node_modules/rxjs/src/internal/util/identity.ts", "node_modules/rxjs/src/internal/util/pipe.ts", "node_modules/rxjs/src/internal/Observable.ts", "node_modules/rxjs/src/internal/util/lift.ts", "node_modules/rxjs/src/internal/operators/OperatorSubscriber.ts", "node_modules/rxjs/src/internal/scheduler/animationFrameProvider.ts", "node_modules/rxjs/src/internal/util/ObjectUnsubscribedError.ts", "node_modules/rxjs/src/internal/Subject.ts", "node_modules/rxjs/src/internal/scheduler/dateTimestampProvider.ts", "node_modules/rxjs/src/internal/ReplaySubject.ts", "node_modules/rxjs/src/internal/scheduler/Action.ts", "node_modules/rxjs/src/internal/scheduler/intervalProvider.ts", "node_modules/rxjs/src/internal/scheduler/AsyncAction.ts", "node_modules/rxjs/src/internal/Scheduler.ts", "node_modules/rxjs/src/internal/scheduler/AsyncScheduler.ts", "node_modules/rxjs/src/internal/scheduler/async.ts", "node_modules/rxjs/src/internal/scheduler/AnimationFrameAction.ts", "node_modules/rxjs/src/internal/scheduler/AnimationFrameScheduler.ts", "node_modules/rxjs/src/internal/scheduler/animationFrame.ts", "node_modules/rxjs/src/internal/observable/empty.ts", "node_modules/rxjs/src/internal/util/isScheduler.ts", "node_modules/rxjs/src/internal/util/args.ts", "node_modules/rxjs/src/internal/util/isArrayLike.ts", "node_modules/rxjs/src/internal/util/isPromise.ts", "node_modules/rxjs/src/internal/util/isInteropObservable.ts", "node_modules/rxjs/src/internal/util/isAsyncIterable.ts", "node_modules/rxjs/src/internal/util/throwUnobservableError.ts", "node_modules/rxjs/src/internal/symbol/iterator.ts", "node_modules/rxjs/src/internal/util/isIterable.ts", "node_modules/rxjs/src/internal/util/isReadableStreamLike.ts", "node_modules/rxjs/src/internal/observable/innerFrom.ts", "node_modules/rxjs/src/internal/util/executeSchedule.ts", "node_modules/rxjs/src/internal/operators/observeOn.ts", "node_modules/rxjs/src/internal/operators/subscribeOn.ts", "node_modules/rxjs/src/internal/scheduled/scheduleObservable.ts", "node_modules/rxjs/src/internal/scheduled/schedulePromise.ts", "node_modules/rxjs/src/internal/scheduled/scheduleArray.ts", "node_modules/rxjs/src/internal/scheduled/scheduleIterable.ts", "node_modules/rxjs/src/internal/scheduled/scheduleAsyncIterable.ts", "node_modules/rxjs/src/internal/scheduled/scheduleReadableStreamLike.ts", "node_modules/rxjs/src/internal/scheduled/scheduled.ts", "node_modules/rxjs/src/internal/observable/from.ts", "node_modules/rxjs/src/internal/observable/of.ts", "node_modules/rxjs/src/internal/observable/throwError.ts", "node_modules/rxjs/src/internal/util/EmptyError.ts", "node_modules/rxjs/src/internal/util/isDate.ts", "node_modules/rxjs/src/internal/operators/map.ts", "node_modules/rxjs/src/internal/util/mapOneOrManyArgs.ts", "node_modules/rxjs/src/internal/util/argsArgArrayOrObject.ts", "node_modules/rxjs/src/internal/util/createObject.ts", "node_modules/rxjs/src/internal/observable/combineLatest.ts", "node_modules/rxjs/src/internal/operators/mergeInternals.ts", "node_modules/rxjs/src/internal/operators/mergeMap.ts", "node_modules/rxjs/src/internal/operators/mergeAll.ts", "node_modules/rxjs/src/internal/operators/concatAll.ts", "node_modules/rxjs/src/internal/observable/concat.ts", "node_modules/rxjs/src/internal/observable/defer.ts", "node_modules/rxjs/src/internal/observable/fromEvent.ts", "node_modules/rxjs/src/internal/observable/fromEventPattern.ts", "node_modules/rxjs/src/internal/observable/timer.ts", "node_modules/rxjs/src/internal/observable/merge.ts", "node_modules/rxjs/src/internal/observable/never.ts", "node_modules/rxjs/src/internal/util/argsOrArgArray.ts", "node_modules/rxjs/src/internal/operators/filter.ts", "node_modules/rxjs/src/internal/observable/zip.ts", "node_modules/rxjs/src/internal/operators/audit.ts", "node_modules/rxjs/src/internal/operators/auditTime.ts", "node_modules/rxjs/src/internal/operators/bufferCount.ts", "node_modules/rxjs/src/internal/operators/catchError.ts", "node_modules/rxjs/src/internal/operators/scanInternals.ts", "node_modules/rxjs/src/internal/operators/combineLatest.ts", "node_modules/rxjs/src/internal/operators/combineLatestWith.ts", "node_modules/rxjs/src/internal/operators/debounceTime.ts", "node_modules/rxjs/src/internal/operators/defaultIfEmpty.ts", "node_modules/rxjs/src/internal/operators/take.ts", "node_modules/rxjs/src/internal/operators/ignoreElements.ts", "node_modules/rxjs/src/internal/operators/mapTo.ts", "node_modules/rxjs/src/internal/operators/delayWhen.ts", "node_modules/rxjs/src/internal/operators/delay.ts", "node_modules/rxjs/src/internal/operators/distinctUntilChanged.ts", "node_modules/rxjs/src/internal/operators/distinctUntilKeyChanged.ts", "node_modules/rxjs/src/internal/operators/throwIfEmpty.ts", "node_modules/rxjs/src/internal/operators/endWith.ts", "node_modules/rxjs/src/internal/operators/finalize.ts", "node_modules/rxjs/src/internal/operators/first.ts", "node_modules/rxjs/src/internal/operators/takeLast.ts", "node_modules/rxjs/src/internal/operators/merge.ts", "node_modules/rxjs/src/internal/operators/mergeWith.ts", "node_modules/rxjs/src/internal/operators/repeat.ts", "node_modules/rxjs/src/internal/operators/sample.ts", "node_modules/rxjs/src/internal/operators/scan.ts", "node_modules/rxjs/src/internal/operators/share.ts", "node_modules/rxjs/src/internal/operators/shareReplay.ts", "node_modules/rxjs/src/internal/operators/skip.ts", "node_modules/rxjs/src/internal/operators/skipUntil.ts", "node_modules/rxjs/src/internal/operators/startWith.ts", "node_modules/rxjs/src/internal/operators/switchMap.ts", "node_modules/rxjs/src/internal/operators/takeUntil.ts", "node_modules/rxjs/src/internal/operators/takeWhile.ts", "node_modules/rxjs/src/internal/operators/tap.ts", "node_modules/rxjs/src/internal/operators/throttle.ts", "node_modules/rxjs/src/internal/operators/throttleTime.ts", "node_modules/rxjs/src/internal/operators/withLatestFrom.ts", "node_modules/rxjs/src/internal/operators/zip.ts", "node_modules/rxjs/src/internal/operators/zipWith.ts", "src/templates/assets/javascripts/browser/document/index.ts", "src/templates/assets/javascripts/browser/element/_/index.ts", "src/templates/assets/javascripts/browser/element/focus/index.ts", "src/templates/assets/javascripts/browser/element/hover/index.ts", "src/templates/assets/javascripts/browser/element/offset/_/index.ts", "src/templates/assets/javascripts/browser/element/offset/content/index.ts", "src/templates/assets/javascripts/utilities/h/index.ts", "src/templates/assets/javascripts/utilities/round/index.ts", "src/templates/assets/javascripts/browser/script/index.ts", "src/templates/assets/javascripts/browser/element/size/_/index.ts", "src/templates/assets/javascripts/browser/element/size/content/index.ts", "src/templates/assets/javascripts/browser/element/visibility/index.ts", "src/templates/assets/javascripts/browser/toggle/index.ts", "src/templates/assets/javascripts/browser/keyboard/index.ts", "src/templates/assets/javascripts/browser/location/_/index.ts", "src/templates/assets/javascripts/browser/location/hash/index.ts", "src/templates/assets/javascripts/browser/media/index.ts", "src/templates/assets/javascripts/browser/request/index.ts", "src/templates/assets/javascripts/browser/viewport/offset/index.ts", "src/templates/assets/javascripts/browser/viewport/size/index.ts", "src/templates/assets/javascripts/browser/viewport/_/index.ts", "src/templates/assets/javascripts/browser/viewport/at/index.ts", "src/templates/assets/javascripts/browser/worker/index.ts", "src/templates/assets/javascripts/_/index.ts", "src/templates/assets/javascripts/components/_/index.ts", "src/templates/assets/javascripts/components/announce/index.ts", "src/templates/assets/javascripts/components/consent/index.ts", "src/templates/assets/javascripts/templates/tooltip/index.tsx", "src/templates/assets/javascripts/templates/annotation/index.tsx", "src/templates/assets/javascripts/templates/clipboard/index.tsx", "src/templates/assets/javascripts/templates/search/index.tsx", "src/templates/assets/javascripts/templates/source/index.tsx", "src/templates/assets/javascripts/templates/tabbed/index.tsx", "src/templates/assets/javascripts/templates/table/index.tsx", "src/templates/assets/javascripts/templates/version/index.tsx", "src/templates/assets/javascripts/components/tooltip/index.ts", "src/templates/assets/javascripts/components/content/annotation/_/index.ts", "src/templates/assets/javascripts/components/content/annotation/list/index.ts", "src/templates/assets/javascripts/components/content/annotation/block/index.ts", "src/templates/assets/javascripts/components/content/code/_/index.ts", "src/templates/assets/javascripts/components/content/details/index.ts", "src/templates/assets/javascripts/components/content/mermaid/index.css", "src/templates/assets/javascripts/components/content/mermaid/index.ts", "src/templates/assets/javascripts/components/content/table/index.ts", "src/templates/assets/javascripts/components/content/tabs/index.ts", "src/templates/assets/javascripts/components/content/_/index.ts", "src/templates/assets/javascripts/components/dialog/index.ts", "src/templates/assets/javascripts/components/header/_/index.ts", "src/templates/assets/javascripts/components/header/title/index.ts", "src/templates/assets/javascripts/components/main/index.ts", "src/templates/assets/javascripts/components/palette/index.ts", "src/templates/assets/javascripts/components/progress/index.ts", "src/templates/assets/javascripts/integrations/clipboard/index.ts", "src/templates/assets/javascripts/integrations/sitemap/index.ts", "src/templates/assets/javascripts/integrations/instant/index.ts", "src/templates/assets/javascripts/integrations/search/highlighter/index.ts", "src/templates/assets/javascripts/integrations/search/worker/message/index.ts", "src/templates/assets/javascripts/integrations/search/worker/_/index.ts", "src/templates/assets/javascripts/integrations/version/index.ts", "src/templates/assets/javascripts/components/search/query/index.ts", "src/templates/assets/javascripts/components/search/result/index.ts", "src/templates/assets/javascripts/components/search/share/index.ts", "src/templates/assets/javascripts/components/search/suggest/index.ts", "src/templates/assets/javascripts/components/search/_/index.ts", "src/templates/assets/javascripts/components/search/highlight/index.ts", "src/templates/assets/javascripts/components/sidebar/index.ts", "src/templates/assets/javascripts/components/source/facts/github/index.ts", "src/templates/assets/javascripts/components/source/facts/gitlab/index.ts", "src/templates/assets/javascripts/components/source/facts/_/index.ts", "src/templates/assets/javascripts/components/source/_/index.ts", "src/templates/assets/javascripts/components/tabs/index.ts", "src/templates/assets/javascripts/components/toc/index.ts", "src/templates/assets/javascripts/components/top/index.ts", "src/templates/assets/javascripts/patches/ellipsis/index.ts", "src/templates/assets/javascripts/patches/indeterminate/index.ts", "src/templates/assets/javascripts/patches/scrollfix/index.ts", "src/templates/assets/javascripts/patches/scrolllock/index.ts", "src/templates/assets/javascripts/polyfills/index.ts"], + "sourcesContent": ["(function (global, factory) {\n typeof exports === 'object' && typeof module !== 'undefined' ? factory() :\n typeof define === 'function' && define.amd ? define(factory) :\n (factory());\n}(this, (function () { 'use strict';\n\n /**\n * Applies the :focus-visible polyfill at the given scope.\n * A scope in this case is either the top-level Document or a Shadow Root.\n *\n * @param {(Document|ShadowRoot)} scope\n * @see https://github.com/WICG/focus-visible\n */\n function applyFocusVisiblePolyfill(scope) {\n var hadKeyboardEvent = true;\n var hadFocusVisibleRecently = false;\n var hadFocusVisibleRecentlyTimeout = null;\n\n var inputTypesAllowlist = {\n text: true,\n search: true,\n url: true,\n tel: true,\n email: true,\n password: true,\n number: true,\n date: true,\n month: true,\n week: true,\n time: true,\n datetime: true,\n 'datetime-local': true\n };\n\n /**\n * Helper function for legacy browsers and iframes which sometimes focus\n * elements like document, body, and non-interactive SVG.\n * @param {Element} el\n */\n function isValidFocusTarget(el) {\n if (\n el &&\n el !== document &&\n el.nodeName !== 'HTML' &&\n el.nodeName !== 'BODY' &&\n 'classList' in el &&\n 'contains' in el.classList\n ) {\n return true;\n }\n return false;\n }\n\n /**\n * Computes whether the given element should automatically trigger the\n * `focus-visible` class being added, i.e. whether it should always match\n * `:focus-visible` when focused.\n * @param {Element} el\n * @return {boolean}\n */\n function focusTriggersKeyboardModality(el) {\n var type = el.type;\n var tagName = el.tagName;\n\n if (tagName === 'INPUT' && inputTypesAllowlist[type] && !el.readOnly) {\n return true;\n }\n\n if (tagName === 'TEXTAREA' && !el.readOnly) {\n return true;\n }\n\n if (el.isContentEditable) {\n return true;\n }\n\n return false;\n }\n\n /**\n * Add the `focus-visible` class to the given element if it was not added by\n * the author.\n * @param {Element} el\n */\n function addFocusVisibleClass(el) {\n if (el.classList.contains('focus-visible')) {\n return;\n }\n el.classList.add('focus-visible');\n el.setAttribute('data-focus-visible-added', '');\n }\n\n /**\n * Remove the `focus-visible` class from the given element if it was not\n * originally added by the author.\n * @param {Element} el\n */\n function removeFocusVisibleClass(el) {\n if (!el.hasAttribute('data-focus-visible-added')) {\n return;\n }\n el.classList.remove('focus-visible');\n el.removeAttribute('data-focus-visible-added');\n }\n\n /**\n * If the most recent user interaction was via the keyboard;\n * and the key press did not include a meta, alt/option, or control key;\n * then the modality is keyboard. Otherwise, the modality is not keyboard.\n * Apply `focus-visible` to any current active element and keep track\n * of our keyboard modality state with `hadKeyboardEvent`.\n * @param {KeyboardEvent} e\n */\n function onKeyDown(e) {\n if (e.metaKey || e.altKey || e.ctrlKey) {\n return;\n }\n\n if (isValidFocusTarget(scope.activeElement)) {\n addFocusVisibleClass(scope.activeElement);\n }\n\n hadKeyboardEvent = true;\n }\n\n /**\n * If at any point a user clicks with a pointing device, ensure that we change\n * the modality away from keyboard.\n * This avoids the situation where a user presses a key on an already focused\n * element, and then clicks on a different element, focusing it with a\n * pointing device, while we still think we're in keyboard modality.\n * @param {Event} e\n */\n function onPointerDown(e) {\n hadKeyboardEvent = false;\n }\n\n /**\n * On `focus`, add the `focus-visible` class to the target if:\n * - the target received focus as a result of keyboard navigation, or\n * - the event target is an element that will likely require interaction\n * via the keyboard (e.g. a text box)\n * @param {Event} e\n */\n function onFocus(e) {\n // Prevent IE from focusing the document or HTML element.\n if (!isValidFocusTarget(e.target)) {\n return;\n }\n\n if (hadKeyboardEvent || focusTriggersKeyboardModality(e.target)) {\n addFocusVisibleClass(e.target);\n }\n }\n\n /**\n * On `blur`, remove the `focus-visible` class from the target.\n * @param {Event} e\n */\n function onBlur(e) {\n if (!isValidFocusTarget(e.target)) {\n return;\n }\n\n if (\n e.target.classList.contains('focus-visible') ||\n e.target.hasAttribute('data-focus-visible-added')\n ) {\n // To detect a tab/window switch, we look for a blur event followed\n // rapidly by a visibility change.\n // If we don't see a visibility change within 100ms, it's probably a\n // regular focus change.\n hadFocusVisibleRecently = true;\n window.clearTimeout(hadFocusVisibleRecentlyTimeout);\n hadFocusVisibleRecentlyTimeout = window.setTimeout(function() {\n hadFocusVisibleRecently = false;\n }, 100);\n removeFocusVisibleClass(e.target);\n }\n }\n\n /**\n * If the user changes tabs, keep track of whether or not the previously\n * focused element had .focus-visible.\n * @param {Event} e\n */\n function onVisibilityChange(e) {\n if (document.visibilityState === 'hidden') {\n // If the tab becomes active again, the browser will handle calling focus\n // on the element (Safari actually calls it twice).\n // If this tab change caused a blur on an element with focus-visible,\n // re-apply the class when the user switches back to the tab.\n if (hadFocusVisibleRecently) {\n hadKeyboardEvent = true;\n }\n addInitialPointerMoveListeners();\n }\n }\n\n /**\n * Add a group of listeners to detect usage of any pointing devices.\n * These listeners will be added when the polyfill first loads, and anytime\n * the window is blurred, so that they are active when the window regains\n * focus.\n */\n function addInitialPointerMoveListeners() {\n document.addEventListener('mousemove', onInitialPointerMove);\n document.addEventListener('mousedown', onInitialPointerMove);\n document.addEventListener('mouseup', onInitialPointerMove);\n document.addEventListener('pointermove', onInitialPointerMove);\n document.addEventListener('pointerdown', onInitialPointerMove);\n document.addEventListener('pointerup', onInitialPointerMove);\n document.addEventListener('touchmove', onInitialPointerMove);\n document.addEventListener('touchstart', onInitialPointerMove);\n document.addEventListener('touchend', onInitialPointerMove);\n }\n\n function removeInitialPointerMoveListeners() {\n document.removeEventListener('mousemove', onInitialPointerMove);\n document.removeEventListener('mousedown', onInitialPointerMove);\n document.removeEventListener('mouseup', onInitialPointerMove);\n document.removeEventListener('pointermove', onInitialPointerMove);\n document.removeEventListener('pointerdown', onInitialPointerMove);\n document.removeEventListener('pointerup', onInitialPointerMove);\n document.removeEventListener('touchmove', onInitialPointerMove);\n document.removeEventListener('touchstart', onInitialPointerMove);\n document.removeEventListener('touchend', onInitialPointerMove);\n }\n\n /**\n * When the polfyill first loads, assume the user is in keyboard modality.\n * If any event is received from a pointing device (e.g. mouse, pointer,\n * touch), turn off keyboard modality.\n * This accounts for situations where focus enters the page from the URL bar.\n * @param {Event} e\n */\n function onInitialPointerMove(e) {\n // Work around a Safari quirk that fires a mousemove on whenever the\n // window blurs, even if you're tabbing out of the page. \u00AF\\_(\u30C4)_/\u00AF\n if (e.target.nodeName && e.target.nodeName.toLowerCase() === 'html') {\n return;\n }\n\n hadKeyboardEvent = false;\n removeInitialPointerMoveListeners();\n }\n\n // For some kinds of state, we are interested in changes at the global scope\n // only. For example, global pointer input, global key presses and global\n // visibility change should affect the state at every scope:\n document.addEventListener('keydown', onKeyDown, true);\n document.addEventListener('mousedown', onPointerDown, true);\n document.addEventListener('pointerdown', onPointerDown, true);\n document.addEventListener('touchstart', onPointerDown, true);\n document.addEventListener('visibilitychange', onVisibilityChange, true);\n\n addInitialPointerMoveListeners();\n\n // For focus and blur, we specifically care about state changes in the local\n // scope. This is because focus / blur events that originate from within a\n // shadow root are not re-dispatched from the host element if it was already\n // the active element in its own scope:\n scope.addEventListener('focus', onFocus, true);\n scope.addEventListener('blur', onBlur, true);\n\n // We detect that a node is a ShadowRoot by ensuring that it is a\n // DocumentFragment and also has a host property. This check covers native\n // implementation and polyfill implementation transparently. If we only cared\n // about the native implementation, we could just check if the scope was\n // an instance of a ShadowRoot.\n if (scope.nodeType === Node.DOCUMENT_FRAGMENT_NODE && scope.host) {\n // Since a ShadowRoot is a special kind of DocumentFragment, it does not\n // have a root element to add a class to. So, we add this attribute to the\n // host element instead:\n scope.host.setAttribute('data-js-focus-visible', '');\n } else if (scope.nodeType === Node.DOCUMENT_NODE) {\n document.documentElement.classList.add('js-focus-visible');\n document.documentElement.setAttribute('data-js-focus-visible', '');\n }\n }\n\n // It is important to wrap all references to global window and document in\n // these checks to support server-side rendering use cases\n // @see https://github.com/WICG/focus-visible/issues/199\n if (typeof window !== 'undefined' && typeof document !== 'undefined') {\n // Make the polyfill helper globally available. This can be used as a signal\n // to interested libraries that wish to coordinate with the polyfill for e.g.,\n // applying the polyfill to a shadow root:\n window.applyFocusVisiblePolyfill = applyFocusVisiblePolyfill;\n\n // Notify interested libraries of the polyfill's presence, in case the\n // polyfill was loaded lazily:\n var event;\n\n try {\n event = new CustomEvent('focus-visible-polyfill-ready');\n } catch (error) {\n // IE11 does not support using CustomEvent as a constructor directly:\n event = document.createEvent('CustomEvent');\n event.initCustomEvent('focus-visible-polyfill-ready', false, false, {});\n }\n\n window.dispatchEvent(event);\n }\n\n if (typeof document !== 'undefined') {\n // Apply the polyfill to the global document, so that no JavaScript\n // coordination is required to use the polyfill in the top-level document:\n applyFocusVisiblePolyfill(document);\n }\n\n})));\n", "/*!\n * clipboard.js v2.0.11\n * https://clipboardjs.com/\n *\n * Licensed MIT \u00A9 Zeno Rocha\n */\n(function webpackUniversalModuleDefinition(root, factory) {\n\tif(typeof exports === 'object' && typeof module === 'object')\n\t\tmodule.exports = factory();\n\telse if(typeof define === 'function' && define.amd)\n\t\tdefine([], factory);\n\telse if(typeof exports === 'object')\n\t\texports[\"ClipboardJS\"] = factory();\n\telse\n\t\troot[\"ClipboardJS\"] = factory();\n})(this, function() {\nreturn /******/ (function() { // webpackBootstrap\n/******/ \tvar __webpack_modules__ = ({\n\n/***/ 686:\n/***/ (function(__unused_webpack_module, __webpack_exports__, __webpack_require__) {\n\n\"use strict\";\n\n// EXPORTS\n__webpack_require__.d(__webpack_exports__, {\n \"default\": function() { return /* binding */ clipboard; }\n});\n\n// EXTERNAL MODULE: ./node_modules/tiny-emitter/index.js\nvar tiny_emitter = __webpack_require__(279);\nvar tiny_emitter_default = /*#__PURE__*/__webpack_require__.n(tiny_emitter);\n// EXTERNAL MODULE: ./node_modules/good-listener/src/listen.js\nvar listen = __webpack_require__(370);\nvar listen_default = /*#__PURE__*/__webpack_require__.n(listen);\n// EXTERNAL MODULE: ./node_modules/select/src/select.js\nvar src_select = __webpack_require__(817);\nvar select_default = /*#__PURE__*/__webpack_require__.n(src_select);\n;// CONCATENATED MODULE: ./src/common/command.js\n/**\n * Executes a given operation type.\n * @param {String} type\n * @return {Boolean}\n */\nfunction command(type) {\n try {\n return document.execCommand(type);\n } catch (err) {\n return false;\n }\n}\n;// CONCATENATED MODULE: ./src/actions/cut.js\n\n\n/**\n * Cut action wrapper.\n * @param {String|HTMLElement} target\n * @return {String}\n */\n\nvar ClipboardActionCut = function ClipboardActionCut(target) {\n var selectedText = select_default()(target);\n command('cut');\n return selectedText;\n};\n\n/* harmony default export */ var actions_cut = (ClipboardActionCut);\n;// CONCATENATED MODULE: ./src/common/create-fake-element.js\n/**\n * Creates a fake textarea element with a value.\n * @param {String} value\n * @return {HTMLElement}\n */\nfunction createFakeElement(value) {\n var isRTL = document.documentElement.getAttribute('dir') === 'rtl';\n var fakeElement = document.createElement('textarea'); // Prevent zooming on iOS\n\n fakeElement.style.fontSize = '12pt'; // Reset box model\n\n fakeElement.style.border = '0';\n fakeElement.style.padding = '0';\n fakeElement.style.margin = '0'; // Move element out of screen horizontally\n\n fakeElement.style.position = 'absolute';\n fakeElement.style[isRTL ? 'right' : 'left'] = '-9999px'; // Move element to the same position vertically\n\n var yPosition = window.pageYOffset || document.documentElement.scrollTop;\n fakeElement.style.top = \"\".concat(yPosition, \"px\");\n fakeElement.setAttribute('readonly', '');\n fakeElement.value = value;\n return fakeElement;\n}\n;// CONCATENATED MODULE: ./src/actions/copy.js\n\n\n\n/**\n * Create fake copy action wrapper using a fake element.\n * @param {String} target\n * @param {Object} options\n * @return {String}\n */\n\nvar fakeCopyAction = function fakeCopyAction(value, options) {\n var fakeElement = createFakeElement(value);\n options.container.appendChild(fakeElement);\n var selectedText = select_default()(fakeElement);\n command('copy');\n fakeElement.remove();\n return selectedText;\n};\n/**\n * Copy action wrapper.\n * @param {String|HTMLElement} target\n * @param {Object} options\n * @return {String}\n */\n\n\nvar ClipboardActionCopy = function ClipboardActionCopy(target) {\n var options = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : {\n container: document.body\n };\n var selectedText = '';\n\n if (typeof target === 'string') {\n selectedText = fakeCopyAction(target, options);\n } else if (target instanceof HTMLInputElement && !['text', 'search', 'url', 'tel', 'password'].includes(target === null || target === void 0 ? void 0 : target.type)) {\n // If input type doesn't support `setSelectionRange`. Simulate it. https://developer.mozilla.org/en-US/docs/Web/API/HTMLInputElement/setSelectionRange\n selectedText = fakeCopyAction(target.value, options);\n } else {\n selectedText = select_default()(target);\n command('copy');\n }\n\n return selectedText;\n};\n\n/* harmony default export */ var actions_copy = (ClipboardActionCopy);\n;// CONCATENATED MODULE: ./src/actions/default.js\nfunction _typeof(obj) { \"@babel/helpers - typeof\"; if (typeof Symbol === \"function\" && typeof Symbol.iterator === \"symbol\") { _typeof = function _typeof(obj) { return typeof obj; }; } else { _typeof = function _typeof(obj) { return obj && typeof Symbol === \"function\" && obj.constructor === Symbol && obj !== Symbol.prototype ? \"symbol\" : typeof obj; }; } return _typeof(obj); }\n\n\n\n/**\n * Inner function which performs selection from either `text` or `target`\n * properties and then executes copy or cut operations.\n * @param {Object} options\n */\n\nvar ClipboardActionDefault = function ClipboardActionDefault() {\n var options = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : {};\n // Defines base properties passed from constructor.\n var _options$action = options.action,\n action = _options$action === void 0 ? 'copy' : _options$action,\n container = options.container,\n target = options.target,\n text = options.text; // Sets the `action` to be performed which can be either 'copy' or 'cut'.\n\n if (action !== 'copy' && action !== 'cut') {\n throw new Error('Invalid \"action\" value, use either \"copy\" or \"cut\"');\n } // Sets the `target` property using an element that will be have its content copied.\n\n\n if (target !== undefined) {\n if (target && _typeof(target) === 'object' && target.nodeType === 1) {\n if (action === 'copy' && target.hasAttribute('disabled')) {\n throw new Error('Invalid \"target\" attribute. Please use \"readonly\" instead of \"disabled\" attribute');\n }\n\n if (action === 'cut' && (target.hasAttribute('readonly') || target.hasAttribute('disabled'))) {\n throw new Error('Invalid \"target\" attribute. You can\\'t cut text from elements with \"readonly\" or \"disabled\" attributes');\n }\n } else {\n throw new Error('Invalid \"target\" value, use a valid Element');\n }\n } // Define selection strategy based on `text` property.\n\n\n if (text) {\n return actions_copy(text, {\n container: container\n });\n } // Defines which selection strategy based on `target` property.\n\n\n if (target) {\n return action === 'cut' ? actions_cut(target) : actions_copy(target, {\n container: container\n });\n }\n};\n\n/* harmony default export */ var actions_default = (ClipboardActionDefault);\n;// CONCATENATED MODULE: ./src/clipboard.js\nfunction clipboard_typeof(obj) { \"@babel/helpers - typeof\"; if (typeof Symbol === \"function\" && typeof Symbol.iterator === \"symbol\") { clipboard_typeof = function _typeof(obj) { return typeof obj; }; } else { clipboard_typeof = function _typeof(obj) { return obj && typeof Symbol === \"function\" && obj.constructor === Symbol && obj !== Symbol.prototype ? \"symbol\" : typeof obj; }; } return clipboard_typeof(obj); }\n\nfunction _classCallCheck(instance, Constructor) { if (!(instance instanceof Constructor)) { throw new TypeError(\"Cannot call a class as a function\"); } }\n\nfunction _defineProperties(target, props) { for (var i = 0; i < props.length; i++) { var descriptor = props[i]; descriptor.enumerable = descriptor.enumerable || false; descriptor.configurable = true; if (\"value\" in descriptor) descriptor.writable = true; Object.defineProperty(target, descriptor.key, descriptor); } }\n\nfunction _createClass(Constructor, protoProps, staticProps) { if (protoProps) _defineProperties(Constructor.prototype, protoProps); if (staticProps) _defineProperties(Constructor, staticProps); return Constructor; }\n\nfunction _inherits(subClass, superClass) { if (typeof superClass !== \"function\" && superClass !== null) { throw new TypeError(\"Super expression must either be null or a function\"); } subClass.prototype = Object.create(superClass && superClass.prototype, { constructor: { value: subClass, writable: true, configurable: true } }); if (superClass) _setPrototypeOf(subClass, superClass); }\n\nfunction _setPrototypeOf(o, p) { _setPrototypeOf = Object.setPrototypeOf || function _setPrototypeOf(o, p) { o.__proto__ = p; return o; }; return _setPrototypeOf(o, p); }\n\nfunction _createSuper(Derived) { var hasNativeReflectConstruct = _isNativeReflectConstruct(); return function _createSuperInternal() { var Super = _getPrototypeOf(Derived), result; if (hasNativeReflectConstruct) { var NewTarget = _getPrototypeOf(this).constructor; result = Reflect.construct(Super, arguments, NewTarget); } else { result = Super.apply(this, arguments); } return _possibleConstructorReturn(this, result); }; }\n\nfunction _possibleConstructorReturn(self, call) { if (call && (clipboard_typeof(call) === \"object\" || typeof call === \"function\")) { return call; } return _assertThisInitialized(self); }\n\nfunction _assertThisInitialized(self) { if (self === void 0) { throw new ReferenceError(\"this hasn't been initialised - super() hasn't been called\"); } return self; }\n\nfunction _isNativeReflectConstruct() { if (typeof Reflect === \"undefined\" || !Reflect.construct) return false; if (Reflect.construct.sham) return false; if (typeof Proxy === \"function\") return true; try { Date.prototype.toString.call(Reflect.construct(Date, [], function () {})); return true; } catch (e) { return false; } }\n\nfunction _getPrototypeOf(o) { _getPrototypeOf = Object.setPrototypeOf ? Object.getPrototypeOf : function _getPrototypeOf(o) { return o.__proto__ || Object.getPrototypeOf(o); }; return _getPrototypeOf(o); }\n\n\n\n\n\n\n/**\n * Helper function to retrieve attribute value.\n * @param {String} suffix\n * @param {Element} element\n */\n\nfunction getAttributeValue(suffix, element) {\n var attribute = \"data-clipboard-\".concat(suffix);\n\n if (!element.hasAttribute(attribute)) {\n return;\n }\n\n return element.getAttribute(attribute);\n}\n/**\n * Base class which takes one or more elements, adds event listeners to them,\n * and instantiates a new `ClipboardAction` on each click.\n */\n\n\nvar Clipboard = /*#__PURE__*/function (_Emitter) {\n _inherits(Clipboard, _Emitter);\n\n var _super = _createSuper(Clipboard);\n\n /**\n * @param {String|HTMLElement|HTMLCollection|NodeList} trigger\n * @param {Object} options\n */\n function Clipboard(trigger, options) {\n var _this;\n\n _classCallCheck(this, Clipboard);\n\n _this = _super.call(this);\n\n _this.resolveOptions(options);\n\n _this.listenClick(trigger);\n\n return _this;\n }\n /**\n * Defines if attributes would be resolved using internal setter functions\n * or custom functions that were passed in the constructor.\n * @param {Object} options\n */\n\n\n _createClass(Clipboard, [{\n key: \"resolveOptions\",\n value: function resolveOptions() {\n var options = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : {};\n this.action = typeof options.action === 'function' ? options.action : this.defaultAction;\n this.target = typeof options.target === 'function' ? options.target : this.defaultTarget;\n this.text = typeof options.text === 'function' ? options.text : this.defaultText;\n this.container = clipboard_typeof(options.container) === 'object' ? options.container : document.body;\n }\n /**\n * Adds a click event listener to the passed trigger.\n * @param {String|HTMLElement|HTMLCollection|NodeList} trigger\n */\n\n }, {\n key: \"listenClick\",\n value: function listenClick(trigger) {\n var _this2 = this;\n\n this.listener = listen_default()(trigger, 'click', function (e) {\n return _this2.onClick(e);\n });\n }\n /**\n * Defines a new `ClipboardAction` on each click event.\n * @param {Event} e\n */\n\n }, {\n key: \"onClick\",\n value: function onClick(e) {\n var trigger = e.delegateTarget || e.currentTarget;\n var action = this.action(trigger) || 'copy';\n var text = actions_default({\n action: action,\n container: this.container,\n target: this.target(trigger),\n text: this.text(trigger)\n }); // Fires an event based on the copy operation result.\n\n this.emit(text ? 'success' : 'error', {\n action: action,\n text: text,\n trigger: trigger,\n clearSelection: function clearSelection() {\n if (trigger) {\n trigger.focus();\n }\n\n window.getSelection().removeAllRanges();\n }\n });\n }\n /**\n * Default `action` lookup function.\n * @param {Element} trigger\n */\n\n }, {\n key: \"defaultAction\",\n value: function defaultAction(trigger) {\n return getAttributeValue('action', trigger);\n }\n /**\n * Default `target` lookup function.\n * @param {Element} trigger\n */\n\n }, {\n key: \"defaultTarget\",\n value: function defaultTarget(trigger) {\n var selector = getAttributeValue('target', trigger);\n\n if (selector) {\n return document.querySelector(selector);\n }\n }\n /**\n * Allow fire programmatically a copy action\n * @param {String|HTMLElement} target\n * @param {Object} options\n * @returns Text copied.\n */\n\n }, {\n key: \"defaultText\",\n\n /**\n * Default `text` lookup function.\n * @param {Element} trigger\n */\n value: function defaultText(trigger) {\n return getAttributeValue('text', trigger);\n }\n /**\n * Destroy lifecycle.\n */\n\n }, {\n key: \"destroy\",\n value: function destroy() {\n this.listener.destroy();\n }\n }], [{\n key: \"copy\",\n value: function copy(target) {\n var options = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : {\n container: document.body\n };\n return actions_copy(target, options);\n }\n /**\n * Allow fire programmatically a cut action\n * @param {String|HTMLElement} target\n * @returns Text cutted.\n */\n\n }, {\n key: \"cut\",\n value: function cut(target) {\n return actions_cut(target);\n }\n /**\n * Returns the support of the given action, or all actions if no action is\n * given.\n * @param {String} [action]\n */\n\n }, {\n key: \"isSupported\",\n value: function isSupported() {\n var action = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : ['copy', 'cut'];\n var actions = typeof action === 'string' ? [action] : action;\n var support = !!document.queryCommandSupported;\n actions.forEach(function (action) {\n support = support && !!document.queryCommandSupported(action);\n });\n return support;\n }\n }]);\n\n return Clipboard;\n}((tiny_emitter_default()));\n\n/* harmony default export */ var clipboard = (Clipboard);\n\n/***/ }),\n\n/***/ 828:\n/***/ (function(module) {\n\nvar DOCUMENT_NODE_TYPE = 9;\n\n/**\n * A polyfill for Element.matches()\n */\nif (typeof Element !== 'undefined' && !Element.prototype.matches) {\n var proto = Element.prototype;\n\n proto.matches = proto.matchesSelector ||\n proto.mozMatchesSelector ||\n proto.msMatchesSelector ||\n proto.oMatchesSelector ||\n proto.webkitMatchesSelector;\n}\n\n/**\n * Finds the closest parent that matches a selector.\n *\n * @param {Element} element\n * @param {String} selector\n * @return {Function}\n */\nfunction closest (element, selector) {\n while (element && element.nodeType !== DOCUMENT_NODE_TYPE) {\n if (typeof element.matches === 'function' &&\n element.matches(selector)) {\n return element;\n }\n element = element.parentNode;\n }\n}\n\nmodule.exports = closest;\n\n\n/***/ }),\n\n/***/ 438:\n/***/ (function(module, __unused_webpack_exports, __webpack_require__) {\n\nvar closest = __webpack_require__(828);\n\n/**\n * Delegates event to a selector.\n *\n * @param {Element} element\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @param {Boolean} useCapture\n * @return {Object}\n */\nfunction _delegate(element, selector, type, callback, useCapture) {\n var listenerFn = listener.apply(this, arguments);\n\n element.addEventListener(type, listenerFn, useCapture);\n\n return {\n destroy: function() {\n element.removeEventListener(type, listenerFn, useCapture);\n }\n }\n}\n\n/**\n * Delegates event to a selector.\n *\n * @param {Element|String|Array} [elements]\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @param {Boolean} useCapture\n * @return {Object}\n */\nfunction delegate(elements, selector, type, callback, useCapture) {\n // Handle the regular Element usage\n if (typeof elements.addEventListener === 'function') {\n return _delegate.apply(null, arguments);\n }\n\n // Handle Element-less usage, it defaults to global delegation\n if (typeof type === 'function') {\n // Use `document` as the first parameter, then apply arguments\n // This is a short way to .unshift `arguments` without running into deoptimizations\n return _delegate.bind(null, document).apply(null, arguments);\n }\n\n // Handle Selector-based usage\n if (typeof elements === 'string') {\n elements = document.querySelectorAll(elements);\n }\n\n // Handle Array-like based usage\n return Array.prototype.map.call(elements, function (element) {\n return _delegate(element, selector, type, callback, useCapture);\n });\n}\n\n/**\n * Finds closest match and invokes callback.\n *\n * @param {Element} element\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @return {Function}\n */\nfunction listener(element, selector, type, callback) {\n return function(e) {\n e.delegateTarget = closest(e.target, selector);\n\n if (e.delegateTarget) {\n callback.call(element, e);\n }\n }\n}\n\nmodule.exports = delegate;\n\n\n/***/ }),\n\n/***/ 879:\n/***/ (function(__unused_webpack_module, exports) {\n\n/**\n * Check if argument is a HTML element.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.node = function(value) {\n return value !== undefined\n && value instanceof HTMLElement\n && value.nodeType === 1;\n};\n\n/**\n * Check if argument is a list of HTML elements.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.nodeList = function(value) {\n var type = Object.prototype.toString.call(value);\n\n return value !== undefined\n && (type === '[object NodeList]' || type === '[object HTMLCollection]')\n && ('length' in value)\n && (value.length === 0 || exports.node(value[0]));\n};\n\n/**\n * Check if argument is a string.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.string = function(value) {\n return typeof value === 'string'\n || value instanceof String;\n};\n\n/**\n * Check if argument is a function.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.fn = function(value) {\n var type = Object.prototype.toString.call(value);\n\n return type === '[object Function]';\n};\n\n\n/***/ }),\n\n/***/ 370:\n/***/ (function(module, __unused_webpack_exports, __webpack_require__) {\n\nvar is = __webpack_require__(879);\nvar delegate = __webpack_require__(438);\n\n/**\n * Validates all params and calls the right\n * listener function based on its target type.\n *\n * @param {String|HTMLElement|HTMLCollection|NodeList} target\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listen(target, type, callback) {\n if (!target && !type && !callback) {\n throw new Error('Missing required arguments');\n }\n\n if (!is.string(type)) {\n throw new TypeError('Second argument must be a String');\n }\n\n if (!is.fn(callback)) {\n throw new TypeError('Third argument must be a Function');\n }\n\n if (is.node(target)) {\n return listenNode(target, type, callback);\n }\n else if (is.nodeList(target)) {\n return listenNodeList(target, type, callback);\n }\n else if (is.string(target)) {\n return listenSelector(target, type, callback);\n }\n else {\n throw new TypeError('First argument must be a String, HTMLElement, HTMLCollection, or NodeList');\n }\n}\n\n/**\n * Adds an event listener to a HTML element\n * and returns a remove listener function.\n *\n * @param {HTMLElement} node\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenNode(node, type, callback) {\n node.addEventListener(type, callback);\n\n return {\n destroy: function() {\n node.removeEventListener(type, callback);\n }\n }\n}\n\n/**\n * Add an event listener to a list of HTML elements\n * and returns a remove listener function.\n *\n * @param {NodeList|HTMLCollection} nodeList\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenNodeList(nodeList, type, callback) {\n Array.prototype.forEach.call(nodeList, function(node) {\n node.addEventListener(type, callback);\n });\n\n return {\n destroy: function() {\n Array.prototype.forEach.call(nodeList, function(node) {\n node.removeEventListener(type, callback);\n });\n }\n }\n}\n\n/**\n * Add an event listener to a selector\n * and returns a remove listener function.\n *\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenSelector(selector, type, callback) {\n return delegate(document.body, selector, type, callback);\n}\n\nmodule.exports = listen;\n\n\n/***/ }),\n\n/***/ 817:\n/***/ (function(module) {\n\nfunction select(element) {\n var selectedText;\n\n if (element.nodeName === 'SELECT') {\n element.focus();\n\n selectedText = element.value;\n }\n else if (element.nodeName === 'INPUT' || element.nodeName === 'TEXTAREA') {\n var isReadOnly = element.hasAttribute('readonly');\n\n if (!isReadOnly) {\n element.setAttribute('readonly', '');\n }\n\n element.select();\n element.setSelectionRange(0, element.value.length);\n\n if (!isReadOnly) {\n element.removeAttribute('readonly');\n }\n\n selectedText = element.value;\n }\n else {\n if (element.hasAttribute('contenteditable')) {\n element.focus();\n }\n\n var selection = window.getSelection();\n var range = document.createRange();\n\n range.selectNodeContents(element);\n selection.removeAllRanges();\n selection.addRange(range);\n\n selectedText = selection.toString();\n }\n\n return selectedText;\n}\n\nmodule.exports = select;\n\n\n/***/ }),\n\n/***/ 279:\n/***/ (function(module) {\n\nfunction E () {\n // Keep this empty so it's easier to inherit from\n // (via https://github.com/lipsmack from https://github.com/scottcorgan/tiny-emitter/issues/3)\n}\n\nE.prototype = {\n on: function (name, callback, ctx) {\n var e = this.e || (this.e = {});\n\n (e[name] || (e[name] = [])).push({\n fn: callback,\n ctx: ctx\n });\n\n return this;\n },\n\n once: function (name, callback, ctx) {\n var self = this;\n function listener () {\n self.off(name, listener);\n callback.apply(ctx, arguments);\n };\n\n listener._ = callback\n return this.on(name, listener, ctx);\n },\n\n emit: function (name) {\n var data = [].slice.call(arguments, 1);\n var evtArr = ((this.e || (this.e = {}))[name] || []).slice();\n var i = 0;\n var len = evtArr.length;\n\n for (i; i < len; i++) {\n evtArr[i].fn.apply(evtArr[i].ctx, data);\n }\n\n return this;\n },\n\n off: function (name, callback) {\n var e = this.e || (this.e = {});\n var evts = e[name];\n var liveEvents = [];\n\n if (evts && callback) {\n for (var i = 0, len = evts.length; i < len; i++) {\n if (evts[i].fn !== callback && evts[i].fn._ !== callback)\n liveEvents.push(evts[i]);\n }\n }\n\n // Remove event from queue to prevent memory leak\n // Suggested by https://github.com/lazd\n // Ref: https://github.com/scottcorgan/tiny-emitter/commit/c6ebfaa9bc973b33d110a84a307742b7cf94c953#commitcomment-5024910\n\n (liveEvents.length)\n ? e[name] = liveEvents\n : delete e[name];\n\n return this;\n }\n};\n\nmodule.exports = E;\nmodule.exports.TinyEmitter = E;\n\n\n/***/ })\n\n/******/ \t});\n/************************************************************************/\n/******/ \t// The module cache\n/******/ \tvar __webpack_module_cache__ = {};\n/******/ \t\n/******/ \t// The require function\n/******/ \tfunction __webpack_require__(moduleId) {\n/******/ \t\t// Check if module is in cache\n/******/ \t\tif(__webpack_module_cache__[moduleId]) {\n/******/ \t\t\treturn __webpack_module_cache__[moduleId].exports;\n/******/ \t\t}\n/******/ \t\t// Create a new module (and put it into the cache)\n/******/ \t\tvar module = __webpack_module_cache__[moduleId] = {\n/******/ \t\t\t// no module.id needed\n/******/ \t\t\t// no module.loaded needed\n/******/ \t\t\texports: {}\n/******/ \t\t};\n/******/ \t\n/******/ \t\t// Execute the module function\n/******/ \t\t__webpack_modules__[moduleId](module, module.exports, __webpack_require__);\n/******/ \t\n/******/ \t\t// Return the exports of the module\n/******/ \t\treturn module.exports;\n/******/ \t}\n/******/ \t\n/************************************************************************/\n/******/ \t/* webpack/runtime/compat get default export */\n/******/ \t!function() {\n/******/ \t\t// getDefaultExport function for compatibility with non-harmony modules\n/******/ \t\t__webpack_require__.n = function(module) {\n/******/ \t\t\tvar getter = module && module.__esModule ?\n/******/ \t\t\t\tfunction() { return module['default']; } :\n/******/ \t\t\t\tfunction() { return module; };\n/******/ \t\t\t__webpack_require__.d(getter, { a: getter });\n/******/ \t\t\treturn getter;\n/******/ \t\t};\n/******/ \t}();\n/******/ \t\n/******/ \t/* webpack/runtime/define property getters */\n/******/ \t!function() {\n/******/ \t\t// define getter functions for harmony exports\n/******/ \t\t__webpack_require__.d = function(exports, definition) {\n/******/ \t\t\tfor(var key in definition) {\n/******/ \t\t\t\tif(__webpack_require__.o(definition, key) && !__webpack_require__.o(exports, key)) {\n/******/ \t\t\t\t\tObject.defineProperty(exports, key, { enumerable: true, get: definition[key] });\n/******/ \t\t\t\t}\n/******/ \t\t\t}\n/******/ \t\t};\n/******/ \t}();\n/******/ \t\n/******/ \t/* webpack/runtime/hasOwnProperty shorthand */\n/******/ \t!function() {\n/******/ \t\t__webpack_require__.o = function(obj, prop) { return Object.prototype.hasOwnProperty.call(obj, prop); }\n/******/ \t}();\n/******/ \t\n/************************************************************************/\n/******/ \t// module exports must be returned from runtime so entry inlining is disabled\n/******/ \t// startup\n/******/ \t// Load entry module and return exports\n/******/ \treturn __webpack_require__(686);\n/******/ })()\n.default;\n});", "/*!\n * escape-html\n * Copyright(c) 2012-2013 TJ Holowaychuk\n * Copyright(c) 2015 Andreas Lubbe\n * Copyright(c) 2015 Tiancheng \"Timothy\" Gu\n * MIT Licensed\n */\n\n'use strict';\n\n/**\n * Module variables.\n * @private\n */\n\nvar matchHtmlRegExp = /[\"'&<>]/;\n\n/**\n * Module exports.\n * @public\n */\n\nmodule.exports = escapeHtml;\n\n/**\n * Escape special characters in the given string of html.\n *\n * @param {string} string The string to escape for inserting into HTML\n * @return {string}\n * @public\n */\n\nfunction escapeHtml(string) {\n var str = '' + string;\n var match = matchHtmlRegExp.exec(str);\n\n if (!match) {\n return str;\n }\n\n var escape;\n var html = '';\n var index = 0;\n var lastIndex = 0;\n\n for (index = match.index; index < str.length; index++) {\n switch (str.charCodeAt(index)) {\n case 34: // \"\n escape = '"';\n break;\n case 38: // &\n escape = '&';\n break;\n case 39: // '\n escape = ''';\n break;\n case 60: // <\n escape = '<';\n break;\n case 62: // >\n escape = '>';\n break;\n default:\n continue;\n }\n\n if (lastIndex !== index) {\n html += str.substring(lastIndex, index);\n }\n\n lastIndex = index + 1;\n html += escape;\n }\n\n return lastIndex !== index\n ? html + str.substring(lastIndex, index)\n : html;\n}\n", "/*\n * Copyright (c) 2016-2024 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport \"focus-visible\"\n\nimport {\n EMPTY,\n NEVER,\n Observable,\n Subject,\n defer,\n delay,\n filter,\n map,\n merge,\n mergeWith,\n shareReplay,\n switchMap\n} from \"rxjs\"\n\nimport { configuration, feature } from \"./_\"\nimport {\n at,\n getActiveElement,\n getOptionalElement,\n requestJSON,\n setLocation,\n setToggle,\n watchDocument,\n watchKeyboard,\n watchLocation,\n watchLocationTarget,\n watchMedia,\n watchPrint,\n watchScript,\n watchViewport\n} from \"./browser\"\nimport {\n getComponentElement,\n getComponentElements,\n mountAnnounce,\n mountBackToTop,\n mountConsent,\n mountContent,\n mountDialog,\n mountHeader,\n mountHeaderTitle,\n mountPalette,\n mountProgress,\n mountSearch,\n mountSearchHiglight,\n mountSidebar,\n mountSource,\n mountTableOfContents,\n mountTabs,\n watchHeader,\n watchMain\n} from \"./components\"\nimport {\n SearchIndex,\n setupClipboardJS,\n setupInstantNavigation,\n setupVersionSelector\n} from \"./integrations\"\nimport {\n patchEllipsis,\n patchIndeterminate,\n patchScrollfix,\n patchScrolllock\n} from \"./patches\"\nimport \"./polyfills\"\n\n/* ----------------------------------------------------------------------------\n * Functions - @todo refactor\n * ------------------------------------------------------------------------- */\n\n/**\n * Fetch search index\n *\n * @returns Search index observable\n */\nfunction fetchSearchIndex(): Observable {\n if (location.protocol === \"file:\") {\n return watchScript(\n `${new URL(\"search/search_index.js\", config.base)}`\n )\n .pipe(\n // @ts-ignore - @todo fix typings\n map(() => __index),\n shareReplay(1)\n )\n } else {\n return requestJSON(\n new URL(\"search/search_index.json\", config.base)\n )\n }\n}\n\n/* ----------------------------------------------------------------------------\n * Application\n * ------------------------------------------------------------------------- */\n\n/* Yay, JavaScript is available */\ndocument.documentElement.classList.remove(\"no-js\")\ndocument.documentElement.classList.add(\"js\")\n\n/* Set up navigation observables and subjects */\nconst document$ = watchDocument()\nconst location$ = watchLocation()\nconst target$ = watchLocationTarget(location$)\nconst keyboard$ = watchKeyboard()\n\n/* Set up media observables */\nconst viewport$ = watchViewport()\nconst tablet$ = watchMedia(\"(min-width: 960px)\")\nconst screen$ = watchMedia(\"(min-width: 1220px)\")\nconst print$ = watchPrint()\n\n/* Retrieve search index, if search is enabled */\nconst config = configuration()\nconst index$ = document.forms.namedItem(\"search\")\n ? fetchSearchIndex()\n : NEVER\n\n/* Set up Clipboard.js integration */\nconst alert$ = new Subject()\nsetupClipboardJS({ alert$ })\n\n/* Set up progress indicator */\nconst progress$ = new Subject()\n\n/* Set up instant navigation, if enabled */\nif (feature(\"navigation.instant\"))\n setupInstantNavigation({ location$, viewport$, progress$ })\n .subscribe(document$)\n\n/* Set up version selector */\nif (config.version?.provider === \"mike\")\n setupVersionSelector({ document$ })\n\n/* Always close drawer and search on navigation */\nmerge(location$, target$)\n .pipe(\n delay(125)\n )\n .subscribe(() => {\n setToggle(\"drawer\", false)\n setToggle(\"search\", false)\n })\n\n/* Set up global keyboard handlers */\nkeyboard$\n .pipe(\n filter(({ mode }) => mode === \"global\")\n )\n .subscribe(key => {\n switch (key.type) {\n\n /* Go to previous page */\n case \"p\":\n case \",\":\n const prev = getOptionalElement(\"link[rel=prev]\")\n if (typeof prev !== \"undefined\")\n setLocation(prev)\n break\n\n /* Go to next page */\n case \"n\":\n case \".\":\n const next = getOptionalElement(\"link[rel=next]\")\n if (typeof next !== \"undefined\")\n setLocation(next)\n break\n\n /* Expand navigation, see https://bit.ly/3ZjG5io */\n case \"Enter\":\n const active = getActiveElement()\n if (active instanceof HTMLLabelElement)\n active.click()\n }\n })\n\n/* Set up patches */\npatchEllipsis({ document$ })\npatchIndeterminate({ document$, tablet$ })\npatchScrollfix({ document$ })\npatchScrolllock({ viewport$, tablet$ })\n\n/* Set up header and main area observable */\nconst header$ = watchHeader(getComponentElement(\"header\"), { viewport$ })\nconst main$ = document$\n .pipe(\n map(() => getComponentElement(\"main\")),\n switchMap(el => watchMain(el, { viewport$, header$ })),\n shareReplay(1)\n )\n\n/* Set up control component observables */\nconst control$ = merge(\n\n /* Consent */\n ...getComponentElements(\"consent\")\n .map(el => mountConsent(el, { target$ })),\n\n /* Dialog */\n ...getComponentElements(\"dialog\")\n .map(el => mountDialog(el, { alert$ })),\n\n /* Header */\n ...getComponentElements(\"header\")\n .map(el => mountHeader(el, { viewport$, header$, main$ })),\n\n /* Color palette */\n ...getComponentElements(\"palette\")\n .map(el => mountPalette(el)),\n\n /* Progress bar */\n ...getComponentElements(\"progress\")\n .map(el => mountProgress(el, { progress$ })),\n\n /* Search */\n ...getComponentElements(\"search\")\n .map(el => mountSearch(el, { index$, keyboard$ })),\n\n /* Repository information */\n ...getComponentElements(\"source\")\n .map(el => mountSource(el))\n)\n\n/* Set up content component observables */\nconst content$ = defer(() => merge(\n\n /* Announcement bar */\n ...getComponentElements(\"announce\")\n .map(el => mountAnnounce(el)),\n\n /* Content */\n ...getComponentElements(\"content\")\n .map(el => mountContent(el, { viewport$, target$, print$ })),\n\n /* Search highlighting */\n ...getComponentElements(\"content\")\n .map(el => feature(\"search.highlight\")\n ? mountSearchHiglight(el, { index$, location$ })\n : EMPTY\n ),\n\n /* Header title */\n ...getComponentElements(\"header-title\")\n .map(el => mountHeaderTitle(el, { viewport$, header$ })),\n\n /* Sidebar */\n ...getComponentElements(\"sidebar\")\n .map(el => el.getAttribute(\"data-md-type\") === \"navigation\"\n ? at(screen$, () => mountSidebar(el, { viewport$, header$, main$ }))\n : at(tablet$, () => mountSidebar(el, { viewport$, header$, main$ }))\n ),\n\n /* Navigation tabs */\n ...getComponentElements(\"tabs\")\n .map(el => mountTabs(el, { viewport$, header$ })),\n\n /* Table of contents */\n ...getComponentElements(\"toc\")\n .map(el => mountTableOfContents(el, {\n viewport$, header$, main$, target$\n })),\n\n /* Back-to-top button */\n ...getComponentElements(\"top\")\n .map(el => mountBackToTop(el, { viewport$, header$, main$, target$ }))\n))\n\n/* Set up component observables */\nconst component$ = document$\n .pipe(\n switchMap(() => content$),\n mergeWith(control$),\n shareReplay(1)\n )\n\n/* Subscribe to all components */\ncomponent$.subscribe()\n\n/* ----------------------------------------------------------------------------\n * Exports\n * ------------------------------------------------------------------------- */\n\nwindow.document$ = document$ /* Document observable */\nwindow.location$ = location$ /* Location subject */\nwindow.target$ = target$ /* Location target observable */\nwindow.keyboard$ = keyboard$ /* Keyboard observable */\nwindow.viewport$ = viewport$ /* Viewport observable */\nwindow.tablet$ = tablet$ /* Media tablet observable */\nwindow.screen$ = screen$ /* Media screen observable */\nwindow.print$ = print$ /* Media print observable */\nwindow.alert$ = alert$ /* Alert subject */\nwindow.progress$ = progress$ /* Progress indicator subject */\nwindow.component$ = component$ /* Component observable */\n", "/*! *****************************************************************************\r\nCopyright (c) Microsoft Corporation.\r\n\r\nPermission to use, copy, modify, and/or distribute this software for any\r\npurpose with or without fee is hereby granted.\r\n\r\nTHE SOFTWARE IS PROVIDED \"AS IS\" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH\r\nREGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY\r\nAND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,\r\nINDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM\r\nLOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR\r\nOTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR\r\nPERFORMANCE OF THIS SOFTWARE.\r\n***************************************************************************** */\r\n/* global Reflect, Promise */\r\n\r\nvar extendStatics = function(d, b) {\r\n extendStatics = Object.setPrototypeOf ||\r\n ({ __proto__: [] } instanceof Array && function (d, b) { d.__proto__ = b; }) ||\r\n function (d, b) { for (var p in b) if (Object.prototype.hasOwnProperty.call(b, p)) d[p] = b[p]; };\r\n return extendStatics(d, b);\r\n};\r\n\r\nexport function __extends(d, b) {\r\n if (typeof b !== \"function\" && b !== null)\r\n throw new TypeError(\"Class extends value \" + String(b) + \" is not a constructor or null\");\r\n extendStatics(d, b);\r\n function __() { this.constructor = d; }\r\n d.prototype = b === null ? Object.create(b) : (__.prototype = b.prototype, new __());\r\n}\r\n\r\nexport var __assign = function() {\r\n __assign = Object.assign || function __assign(t) {\r\n for (var s, i = 1, n = arguments.length; i < n; i++) {\r\n s = arguments[i];\r\n for (var p in s) if (Object.prototype.hasOwnProperty.call(s, p)) t[p] = s[p];\r\n }\r\n return t;\r\n }\r\n return __assign.apply(this, arguments);\r\n}\r\n\r\nexport function __rest(s, e) {\r\n var t = {};\r\n for (var p in s) if (Object.prototype.hasOwnProperty.call(s, p) && e.indexOf(p) < 0)\r\n t[p] = s[p];\r\n if (s != null && typeof Object.getOwnPropertySymbols === \"function\")\r\n for (var i = 0, p = Object.getOwnPropertySymbols(s); i < p.length; i++) {\r\n if (e.indexOf(p[i]) < 0 && Object.prototype.propertyIsEnumerable.call(s, p[i]))\r\n t[p[i]] = s[p[i]];\r\n }\r\n return t;\r\n}\r\n\r\nexport function __decorate(decorators, target, key, desc) {\r\n var c = arguments.length, r = c < 3 ? target : desc === null ? desc = Object.getOwnPropertyDescriptor(target, key) : desc, d;\r\n if (typeof Reflect === \"object\" && typeof Reflect.decorate === \"function\") r = Reflect.decorate(decorators, target, key, desc);\r\n else for (var i = decorators.length - 1; i >= 0; i--) if (d = decorators[i]) r = (c < 3 ? d(r) : c > 3 ? d(target, key, r) : d(target, key)) || r;\r\n return c > 3 && r && Object.defineProperty(target, key, r), r;\r\n}\r\n\r\nexport function __param(paramIndex, decorator) {\r\n return function (target, key) { decorator(target, key, paramIndex); }\r\n}\r\n\r\nexport function __metadata(metadataKey, metadataValue) {\r\n if (typeof Reflect === \"object\" && typeof Reflect.metadata === \"function\") return Reflect.metadata(metadataKey, metadataValue);\r\n}\r\n\r\nexport function __awaiter(thisArg, _arguments, P, generator) {\r\n function adopt(value) { return value instanceof P ? value : new P(function (resolve) { resolve(value); }); }\r\n return new (P || (P = Promise))(function (resolve, reject) {\r\n function fulfilled(value) { try { step(generator.next(value)); } catch (e) { reject(e); } }\r\n function rejected(value) { try { step(generator[\"throw\"](value)); } catch (e) { reject(e); } }\r\n function step(result) { result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected); }\r\n step((generator = generator.apply(thisArg, _arguments || [])).next());\r\n });\r\n}\r\n\r\nexport function __generator(thisArg, body) {\r\n var _ = { label: 0, sent: function() { if (t[0] & 1) throw t[1]; return t[1]; }, trys: [], ops: [] }, f, y, t, g;\r\n return g = { next: verb(0), \"throw\": verb(1), \"return\": verb(2) }, typeof Symbol === \"function\" && (g[Symbol.iterator] = function() { return this; }), g;\r\n function verb(n) { return function (v) { return step([n, v]); }; }\r\n function step(op) {\r\n if (f) throw new TypeError(\"Generator is already executing.\");\r\n while (_) try {\r\n if (f = 1, y && (t = op[0] & 2 ? y[\"return\"] : op[0] ? y[\"throw\"] || ((t = y[\"return\"]) && t.call(y), 0) : y.next) && !(t = t.call(y, op[1])).done) return t;\r\n if (y = 0, t) op = [op[0] & 2, t.value];\r\n switch (op[0]) {\r\n case 0: case 1: t = op; break;\r\n case 4: _.label++; return { value: op[1], done: false };\r\n case 5: _.label++; y = op[1]; op = [0]; continue;\r\n case 7: op = _.ops.pop(); _.trys.pop(); continue;\r\n default:\r\n if (!(t = _.trys, t = t.length > 0 && t[t.length - 1]) && (op[0] === 6 || op[0] === 2)) { _ = 0; continue; }\r\n if (op[0] === 3 && (!t || (op[1] > t[0] && op[1] < t[3]))) { _.label = op[1]; break; }\r\n if (op[0] === 6 && _.label < t[1]) { _.label = t[1]; t = op; break; }\r\n if (t && _.label < t[2]) { _.label = t[2]; _.ops.push(op); break; }\r\n if (t[2]) _.ops.pop();\r\n _.trys.pop(); continue;\r\n }\r\n op = body.call(thisArg, _);\r\n } catch (e) { op = [6, e]; y = 0; } finally { f = t = 0; }\r\n if (op[0] & 5) throw op[1]; return { value: op[0] ? op[1] : void 0, done: true };\r\n }\r\n}\r\n\r\nexport var __createBinding = Object.create ? (function(o, m, k, k2) {\r\n if (k2 === undefined) k2 = k;\r\n Object.defineProperty(o, k2, { enumerable: true, get: function() { return m[k]; } });\r\n}) : (function(o, m, k, k2) {\r\n if (k2 === undefined) k2 = k;\r\n o[k2] = m[k];\r\n});\r\n\r\nexport function __exportStar(m, o) {\r\n for (var p in m) if (p !== \"default\" && !Object.prototype.hasOwnProperty.call(o, p)) __createBinding(o, m, p);\r\n}\r\n\r\nexport function __values(o) {\r\n var s = typeof Symbol === \"function\" && Symbol.iterator, m = s && o[s], i = 0;\r\n if (m) return m.call(o);\r\n if (o && typeof o.length === \"number\") return {\r\n next: function () {\r\n if (o && i >= o.length) o = void 0;\r\n return { value: o && o[i++], done: !o };\r\n }\r\n };\r\n throw new TypeError(s ? \"Object is not iterable.\" : \"Symbol.iterator is not defined.\");\r\n}\r\n\r\nexport function __read(o, n) {\r\n var m = typeof Symbol === \"function\" && o[Symbol.iterator];\r\n if (!m) return o;\r\n var i = m.call(o), r, ar = [], e;\r\n try {\r\n while ((n === void 0 || n-- > 0) && !(r = i.next()).done) ar.push(r.value);\r\n }\r\n catch (error) { e = { error: error }; }\r\n finally {\r\n try {\r\n if (r && !r.done && (m = i[\"return\"])) m.call(i);\r\n }\r\n finally { if (e) throw e.error; }\r\n }\r\n return ar;\r\n}\r\n\r\n/** @deprecated */\r\nexport function __spread() {\r\n for (var ar = [], i = 0; i < arguments.length; i++)\r\n ar = ar.concat(__read(arguments[i]));\r\n return ar;\r\n}\r\n\r\n/** @deprecated */\r\nexport function __spreadArrays() {\r\n for (var s = 0, i = 0, il = arguments.length; i < il; i++) s += arguments[i].length;\r\n for (var r = Array(s), k = 0, i = 0; i < il; i++)\r\n for (var a = arguments[i], j = 0, jl = a.length; j < jl; j++, k++)\r\n r[k] = a[j];\r\n return r;\r\n}\r\n\r\nexport function __spreadArray(to, from, pack) {\r\n if (pack || arguments.length === 2) for (var i = 0, l = from.length, ar; i < l; i++) {\r\n if (ar || !(i in from)) {\r\n if (!ar) ar = Array.prototype.slice.call(from, 0, i);\r\n ar[i] = from[i];\r\n }\r\n }\r\n return to.concat(ar || Array.prototype.slice.call(from));\r\n}\r\n\r\nexport function __await(v) {\r\n return this instanceof __await ? (this.v = v, this) : new __await(v);\r\n}\r\n\r\nexport function __asyncGenerator(thisArg, _arguments, generator) {\r\n if (!Symbol.asyncIterator) throw new TypeError(\"Symbol.asyncIterator is not defined.\");\r\n var g = generator.apply(thisArg, _arguments || []), i, q = [];\r\n return i = {}, verb(\"next\"), verb(\"throw\"), verb(\"return\"), i[Symbol.asyncIterator] = function () { return this; }, i;\r\n function verb(n) { if (g[n]) i[n] = function (v) { return new Promise(function (a, b) { q.push([n, v, a, b]) > 1 || resume(n, v); }); }; }\r\n function resume(n, v) { try { step(g[n](v)); } catch (e) { settle(q[0][3], e); } }\r\n function step(r) { r.value instanceof __await ? Promise.resolve(r.value.v).then(fulfill, reject) : settle(q[0][2], r); }\r\n function fulfill(value) { resume(\"next\", value); }\r\n function reject(value) { resume(\"throw\", value); }\r\n function settle(f, v) { if (f(v), q.shift(), q.length) resume(q[0][0], q[0][1]); }\r\n}\r\n\r\nexport function __asyncDelegator(o) {\r\n var i, p;\r\n return i = {}, verb(\"next\"), verb(\"throw\", function (e) { throw e; }), verb(\"return\"), i[Symbol.iterator] = function () { return this; }, i;\r\n function verb(n, f) { i[n] = o[n] ? function (v) { return (p = !p) ? { value: __await(o[n](v)), done: n === \"return\" } : f ? f(v) : v; } : f; }\r\n}\r\n\r\nexport function __asyncValues(o) {\r\n if (!Symbol.asyncIterator) throw new TypeError(\"Symbol.asyncIterator is not defined.\");\r\n var m = o[Symbol.asyncIterator], i;\r\n return m ? m.call(o) : (o = typeof __values === \"function\" ? __values(o) : o[Symbol.iterator](), i = {}, verb(\"next\"), verb(\"throw\"), verb(\"return\"), i[Symbol.asyncIterator] = function () { return this; }, i);\r\n function verb(n) { i[n] = o[n] && function (v) { return new Promise(function (resolve, reject) { v = o[n](v), settle(resolve, reject, v.done, v.value); }); }; }\r\n function settle(resolve, reject, d, v) { Promise.resolve(v).then(function(v) { resolve({ value: v, done: d }); }, reject); }\r\n}\r\n\r\nexport function __makeTemplateObject(cooked, raw) {\r\n if (Object.defineProperty) { Object.defineProperty(cooked, \"raw\", { value: raw }); } else { cooked.raw = raw; }\r\n return cooked;\r\n};\r\n\r\nvar __setModuleDefault = Object.create ? (function(o, v) {\r\n Object.defineProperty(o, \"default\", { enumerable: true, value: v });\r\n}) : function(o, v) {\r\n o[\"default\"] = v;\r\n};\r\n\r\nexport function __importStar(mod) {\r\n if (mod && mod.__esModule) return mod;\r\n var result = {};\r\n if (mod != null) for (var k in mod) if (k !== \"default\" && Object.prototype.hasOwnProperty.call(mod, k)) __createBinding(result, mod, k);\r\n __setModuleDefault(result, mod);\r\n return result;\r\n}\r\n\r\nexport function __importDefault(mod) {\r\n return (mod && mod.__esModule) ? mod : { default: mod };\r\n}\r\n\r\nexport function __classPrivateFieldGet(receiver, state, kind, f) {\r\n if (kind === \"a\" && !f) throw new TypeError(\"Private accessor was defined without a getter\");\r\n if (typeof state === \"function\" ? receiver !== state || !f : !state.has(receiver)) throw new TypeError(\"Cannot read private member from an object whose class did not declare it\");\r\n return kind === \"m\" ? f : kind === \"a\" ? f.call(receiver) : f ? f.value : state.get(receiver);\r\n}\r\n\r\nexport function __classPrivateFieldSet(receiver, state, value, kind, f) {\r\n if (kind === \"m\") throw new TypeError(\"Private method is not writable\");\r\n if (kind === \"a\" && !f) throw new TypeError(\"Private accessor was defined without a setter\");\r\n if (typeof state === \"function\" ? receiver !== state || !f : !state.has(receiver)) throw new TypeError(\"Cannot write private member to an object whose class did not declare it\");\r\n return (kind === \"a\" ? f.call(receiver, value) : f ? f.value = value : state.set(receiver, value)), value;\r\n}\r\n", "/**\n * Returns true if the object is a function.\n * @param value The value to check\n */\nexport function isFunction(value: any): value is (...args: any[]) => any {\n return typeof value === 'function';\n}\n", "/**\n * Used to create Error subclasses until the community moves away from ES5.\n *\n * This is because compiling from TypeScript down to ES5 has issues with subclassing Errors\n * as well as other built-in types: https://github.com/Microsoft/TypeScript/issues/12123\n *\n * @param createImpl A factory function to create the actual constructor implementation. The returned\n * function should be a named function that calls `_super` internally.\n */\nexport function createErrorClass(createImpl: (_super: any) => any): T {\n const _super = (instance: any) => {\n Error.call(instance);\n instance.stack = new Error().stack;\n };\n\n const ctorFunc = createImpl(_super);\n ctorFunc.prototype = Object.create(Error.prototype);\n ctorFunc.prototype.constructor = ctorFunc;\n return ctorFunc;\n}\n", "import { createErrorClass } from './createErrorClass';\n\nexport interface UnsubscriptionError extends Error {\n readonly errors: any[];\n}\n\nexport interface UnsubscriptionErrorCtor {\n /**\n * @deprecated Internal implementation detail. Do not construct error instances.\n * Cannot be tagged as internal: https://github.com/ReactiveX/rxjs/issues/6269\n */\n new (errors: any[]): UnsubscriptionError;\n}\n\n/**\n * An error thrown when one or more errors have occurred during the\n * `unsubscribe` of a {@link Subscription}.\n */\nexport const UnsubscriptionError: UnsubscriptionErrorCtor = createErrorClass(\n (_super) =>\n function UnsubscriptionErrorImpl(this: any, errors: (Error | string)[]) {\n _super(this);\n this.message = errors\n ? `${errors.length} errors occurred during unsubscription:\n${errors.map((err, i) => `${i + 1}) ${err.toString()}`).join('\\n ')}`\n : '';\n this.name = 'UnsubscriptionError';\n this.errors = errors;\n }\n);\n", "/**\n * Removes an item from an array, mutating it.\n * @param arr The array to remove the item from\n * @param item The item to remove\n */\nexport function arrRemove(arr: T[] | undefined | null, item: T) {\n if (arr) {\n const index = arr.indexOf(item);\n 0 <= index && arr.splice(index, 1);\n }\n}\n", "import { isFunction } from './util/isFunction';\nimport { UnsubscriptionError } from './util/UnsubscriptionError';\nimport { SubscriptionLike, TeardownLogic, Unsubscribable } from './types';\nimport { arrRemove } from './util/arrRemove';\n\n/**\n * Represents a disposable resource, such as the execution of an Observable. A\n * Subscription has one important method, `unsubscribe`, that takes no argument\n * and just disposes the resource held by the subscription.\n *\n * Additionally, subscriptions may be grouped together through the `add()`\n * method, which will attach a child Subscription to the current Subscription.\n * When a Subscription is unsubscribed, all its children (and its grandchildren)\n * will be unsubscribed as well.\n *\n * @class Subscription\n */\nexport class Subscription implements SubscriptionLike {\n /** @nocollapse */\n public static EMPTY = (() => {\n const empty = new Subscription();\n empty.closed = true;\n return empty;\n })();\n\n /**\n * A flag to indicate whether this Subscription has already been unsubscribed.\n */\n public closed = false;\n\n private _parentage: Subscription[] | Subscription | null = null;\n\n /**\n * The list of registered finalizers to execute upon unsubscription. Adding and removing from this\n * list occurs in the {@link #add} and {@link #remove} methods.\n */\n private _finalizers: Exclude[] | null = null;\n\n /**\n * @param initialTeardown A function executed first as part of the finalization\n * process that is kicked off when {@link #unsubscribe} is called.\n */\n constructor(private initialTeardown?: () => void) {}\n\n /**\n * Disposes the resources held by the subscription. May, for instance, cancel\n * an ongoing Observable execution or cancel any other type of work that\n * started when the Subscription was created.\n * @return {void}\n */\n unsubscribe(): void {\n let errors: any[] | undefined;\n\n if (!this.closed) {\n this.closed = true;\n\n // Remove this from it's parents.\n const { _parentage } = this;\n if (_parentage) {\n this._parentage = null;\n if (Array.isArray(_parentage)) {\n for (const parent of _parentage) {\n parent.remove(this);\n }\n } else {\n _parentage.remove(this);\n }\n }\n\n const { initialTeardown: initialFinalizer } = this;\n if (isFunction(initialFinalizer)) {\n try {\n initialFinalizer();\n } catch (e) {\n errors = e instanceof UnsubscriptionError ? e.errors : [e];\n }\n }\n\n const { _finalizers } = this;\n if (_finalizers) {\n this._finalizers = null;\n for (const finalizer of _finalizers) {\n try {\n execFinalizer(finalizer);\n } catch (err) {\n errors = errors ?? [];\n if (err instanceof UnsubscriptionError) {\n errors = [...errors, ...err.errors];\n } else {\n errors.push(err);\n }\n }\n }\n }\n\n if (errors) {\n throw new UnsubscriptionError(errors);\n }\n }\n }\n\n /**\n * Adds a finalizer to this subscription, so that finalization will be unsubscribed/called\n * when this subscription is unsubscribed. If this subscription is already {@link #closed},\n * because it has already been unsubscribed, then whatever finalizer is passed to it\n * will automatically be executed (unless the finalizer itself is also a closed subscription).\n *\n * Closed Subscriptions cannot be added as finalizers to any subscription. Adding a closed\n * subscription to a any subscription will result in no operation. (A noop).\n *\n * Adding a subscription to itself, or adding `null` or `undefined` will not perform any\n * operation at all. (A noop).\n *\n * `Subscription` instances that are added to this instance will automatically remove themselves\n * if they are unsubscribed. Functions and {@link Unsubscribable} objects that you wish to remove\n * will need to be removed manually with {@link #remove}\n *\n * @param teardown The finalization logic to add to this subscription.\n */\n add(teardown: TeardownLogic): void {\n // Only add the finalizer if it's not undefined\n // and don't add a subscription to itself.\n if (teardown && teardown !== this) {\n if (this.closed) {\n // If this subscription is already closed,\n // execute whatever finalizer is handed to it automatically.\n execFinalizer(teardown);\n } else {\n if (teardown instanceof Subscription) {\n // We don't add closed subscriptions, and we don't add the same subscription\n // twice. Subscription unsubscribe is idempotent.\n if (teardown.closed || teardown._hasParent(this)) {\n return;\n }\n teardown._addParent(this);\n }\n (this._finalizers = this._finalizers ?? []).push(teardown);\n }\n }\n }\n\n /**\n * Checks to see if a this subscription already has a particular parent.\n * This will signal that this subscription has already been added to the parent in question.\n * @param parent the parent to check for\n */\n private _hasParent(parent: Subscription) {\n const { _parentage } = this;\n return _parentage === parent || (Array.isArray(_parentage) && _parentage.includes(parent));\n }\n\n /**\n * Adds a parent to this subscription so it can be removed from the parent if it\n * unsubscribes on it's own.\n *\n * NOTE: THIS ASSUMES THAT {@link _hasParent} HAS ALREADY BEEN CHECKED.\n * @param parent The parent subscription to add\n */\n private _addParent(parent: Subscription) {\n const { _parentage } = this;\n this._parentage = Array.isArray(_parentage) ? (_parentage.push(parent), _parentage) : _parentage ? [_parentage, parent] : parent;\n }\n\n /**\n * Called on a child when it is removed via {@link #remove}.\n * @param parent The parent to remove\n */\n private _removeParent(parent: Subscription) {\n const { _parentage } = this;\n if (_parentage === parent) {\n this._parentage = null;\n } else if (Array.isArray(_parentage)) {\n arrRemove(_parentage, parent);\n }\n }\n\n /**\n * Removes a finalizer from this subscription that was previously added with the {@link #add} method.\n *\n * Note that `Subscription` instances, when unsubscribed, will automatically remove themselves\n * from every other `Subscription` they have been added to. This means that using the `remove` method\n * is not a common thing and should be used thoughtfully.\n *\n * If you add the same finalizer instance of a function or an unsubscribable object to a `Subscription` instance\n * more than once, you will need to call `remove` the same number of times to remove all instances.\n *\n * All finalizer instances are removed to free up memory upon unsubscription.\n *\n * @param teardown The finalizer to remove from this subscription\n */\n remove(teardown: Exclude): void {\n const { _finalizers } = this;\n _finalizers && arrRemove(_finalizers, teardown);\n\n if (teardown instanceof Subscription) {\n teardown._removeParent(this);\n }\n }\n}\n\nexport const EMPTY_SUBSCRIPTION = Subscription.EMPTY;\n\nexport function isSubscription(value: any): value is Subscription {\n return (\n value instanceof Subscription ||\n (value && 'closed' in value && isFunction(value.remove) && isFunction(value.add) && isFunction(value.unsubscribe))\n );\n}\n\nfunction execFinalizer(finalizer: Unsubscribable | (() => void)) {\n if (isFunction(finalizer)) {\n finalizer();\n } else {\n finalizer.unsubscribe();\n }\n}\n", "import { Subscriber } from './Subscriber';\nimport { ObservableNotification } from './types';\n\n/**\n * The {@link GlobalConfig} object for RxJS. It is used to configure things\n * like how to react on unhandled errors.\n */\nexport const config: GlobalConfig = {\n onUnhandledError: null,\n onStoppedNotification: null,\n Promise: undefined,\n useDeprecatedSynchronousErrorHandling: false,\n useDeprecatedNextContext: false,\n};\n\n/**\n * The global configuration object for RxJS, used to configure things\n * like how to react on unhandled errors. Accessible via {@link config}\n * object.\n */\nexport interface GlobalConfig {\n /**\n * A registration point for unhandled errors from RxJS. These are errors that\n * cannot were not handled by consuming code in the usual subscription path. For\n * example, if you have this configured, and you subscribe to an observable without\n * providing an error handler, errors from that subscription will end up here. This\n * will _always_ be called asynchronously on another job in the runtime. This is because\n * we do not want errors thrown in this user-configured handler to interfere with the\n * behavior of the library.\n */\n onUnhandledError: ((err: any) => void) | null;\n\n /**\n * A registration point for notifications that cannot be sent to subscribers because they\n * have completed, errored or have been explicitly unsubscribed. By default, next, complete\n * and error notifications sent to stopped subscribers are noops. However, sometimes callers\n * might want a different behavior. For example, with sources that attempt to report errors\n * to stopped subscribers, a caller can configure RxJS to throw an unhandled error instead.\n * This will _always_ be called asynchronously on another job in the runtime. This is because\n * we do not want errors thrown in this user-configured handler to interfere with the\n * behavior of the library.\n */\n onStoppedNotification: ((notification: ObservableNotification, subscriber: Subscriber) => void) | null;\n\n /**\n * The promise constructor used by default for {@link Observable#toPromise toPromise} and {@link Observable#forEach forEach}\n * methods.\n *\n * @deprecated As of version 8, RxJS will no longer support this sort of injection of a\n * Promise constructor. If you need a Promise implementation other than native promises,\n * please polyfill/patch Promise as you see appropriate. Will be removed in v8.\n */\n Promise?: PromiseConstructorLike;\n\n /**\n * If true, turns on synchronous error rethrowing, which is a deprecated behavior\n * in v6 and higher. This behavior enables bad patterns like wrapping a subscribe\n * call in a try/catch block. It also enables producer interference, a nasty bug\n * where a multicast can be broken for all observers by a downstream consumer with\n * an unhandled error. DO NOT USE THIS FLAG UNLESS IT'S NEEDED TO BUY TIME\n * FOR MIGRATION REASONS.\n *\n * @deprecated As of version 8, RxJS will no longer support synchronous throwing\n * of unhandled errors. All errors will be thrown on a separate call stack to prevent bad\n * behaviors described above. Will be removed in v8.\n */\n useDeprecatedSynchronousErrorHandling: boolean;\n\n /**\n * If true, enables an as-of-yet undocumented feature from v5: The ability to access\n * `unsubscribe()` via `this` context in `next` functions created in observers passed\n * to `subscribe`.\n *\n * This is being removed because the performance was severely problematic, and it could also cause\n * issues when types other than POJOs are passed to subscribe as subscribers, as they will likely have\n * their `this` context overwritten.\n *\n * @deprecated As of version 8, RxJS will no longer support altering the\n * context of next functions provided as part of an observer to Subscribe. Instead,\n * you will have access to a subscription or a signal or token that will allow you to do things like\n * unsubscribe and test closed status. Will be removed in v8.\n */\n useDeprecatedNextContext: boolean;\n}\n", "import type { TimerHandle } from './timerHandle';\ntype SetTimeoutFunction = (handler: () => void, timeout?: number, ...args: any[]) => TimerHandle;\ntype ClearTimeoutFunction = (handle: TimerHandle) => void;\n\ninterface TimeoutProvider {\n setTimeout: SetTimeoutFunction;\n clearTimeout: ClearTimeoutFunction;\n delegate:\n | {\n setTimeout: SetTimeoutFunction;\n clearTimeout: ClearTimeoutFunction;\n }\n | undefined;\n}\n\nexport const timeoutProvider: TimeoutProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n setTimeout(handler: () => void, timeout?: number, ...args) {\n const { delegate } = timeoutProvider;\n if (delegate?.setTimeout) {\n return delegate.setTimeout(handler, timeout, ...args);\n }\n return setTimeout(handler, timeout, ...args);\n },\n clearTimeout(handle) {\n const { delegate } = timeoutProvider;\n return (delegate?.clearTimeout || clearTimeout)(handle as any);\n },\n delegate: undefined,\n};\n", "import { config } from '../config';\nimport { timeoutProvider } from '../scheduler/timeoutProvider';\n\n/**\n * Handles an error on another job either with the user-configured {@link onUnhandledError},\n * or by throwing it on that new job so it can be picked up by `window.onerror`, `process.on('error')`, etc.\n *\n * This should be called whenever there is an error that is out-of-band with the subscription\n * or when an error hits a terminal boundary of the subscription and no error handler was provided.\n *\n * @param err the error to report\n */\nexport function reportUnhandledError(err: any) {\n timeoutProvider.setTimeout(() => {\n const { onUnhandledError } = config;\n if (onUnhandledError) {\n // Execute the user-configured error handler.\n onUnhandledError(err);\n } else {\n // Throw so it is picked up by the runtime's uncaught error mechanism.\n throw err;\n }\n });\n}\n", "/* tslint:disable:no-empty */\nexport function noop() { }\n", "import { CompleteNotification, NextNotification, ErrorNotification } from './types';\n\n/**\n * A completion object optimized for memory use and created to be the\n * same \"shape\" as other notifications in v8.\n * @internal\n */\nexport const COMPLETE_NOTIFICATION = (() => createNotification('C', undefined, undefined) as CompleteNotification)();\n\n/**\n * Internal use only. Creates an optimized error notification that is the same \"shape\"\n * as other notifications.\n * @internal\n */\nexport function errorNotification(error: any): ErrorNotification {\n return createNotification('E', undefined, error) as any;\n}\n\n/**\n * Internal use only. Creates an optimized next notification that is the same \"shape\"\n * as other notifications.\n * @internal\n */\nexport function nextNotification(value: T) {\n return createNotification('N', value, undefined) as NextNotification;\n}\n\n/**\n * Ensures that all notifications created internally have the same \"shape\" in v8.\n *\n * TODO: This is only exported to support a crazy legacy test in `groupBy`.\n * @internal\n */\nexport function createNotification(kind: 'N' | 'E' | 'C', value: any, error: any) {\n return {\n kind,\n value,\n error,\n };\n}\n", "import { config } from '../config';\n\nlet context: { errorThrown: boolean; error: any } | null = null;\n\n/**\n * Handles dealing with errors for super-gross mode. Creates a context, in which\n * any synchronously thrown errors will be passed to {@link captureError}. Which\n * will record the error such that it will be rethrown after the call back is complete.\n * TODO: Remove in v8\n * @param cb An immediately executed function.\n */\nexport function errorContext(cb: () => void) {\n if (config.useDeprecatedSynchronousErrorHandling) {\n const isRoot = !context;\n if (isRoot) {\n context = { errorThrown: false, error: null };\n }\n cb();\n if (isRoot) {\n const { errorThrown, error } = context!;\n context = null;\n if (errorThrown) {\n throw error;\n }\n }\n } else {\n // This is the general non-deprecated path for everyone that\n // isn't crazy enough to use super-gross mode (useDeprecatedSynchronousErrorHandling)\n cb();\n }\n}\n\n/**\n * Captures errors only in super-gross mode.\n * @param err the error to capture\n */\nexport function captureError(err: any) {\n if (config.useDeprecatedSynchronousErrorHandling && context) {\n context.errorThrown = true;\n context.error = err;\n }\n}\n", "import { isFunction } from './util/isFunction';\nimport { Observer, ObservableNotification } from './types';\nimport { isSubscription, Subscription } from './Subscription';\nimport { config } from './config';\nimport { reportUnhandledError } from './util/reportUnhandledError';\nimport { noop } from './util/noop';\nimport { nextNotification, errorNotification, COMPLETE_NOTIFICATION } from './NotificationFactories';\nimport { timeoutProvider } from './scheduler/timeoutProvider';\nimport { captureError } from './util/errorContext';\n\n/**\n * Implements the {@link Observer} interface and extends the\n * {@link Subscription} class. While the {@link Observer} is the public API for\n * consuming the values of an {@link Observable}, all Observers get converted to\n * a Subscriber, in order to provide Subscription-like capabilities such as\n * `unsubscribe`. Subscriber is a common type in RxJS, and crucial for\n * implementing operators, but it is rarely used as a public API.\n *\n * @class Subscriber\n */\nexport class Subscriber extends Subscription implements Observer {\n /**\n * A static factory for a Subscriber, given a (potentially partial) definition\n * of an Observer.\n * @param next The `next` callback of an Observer.\n * @param error The `error` callback of an\n * Observer.\n * @param complete The `complete` callback of an\n * Observer.\n * @return A Subscriber wrapping the (partially defined)\n * Observer represented by the given arguments.\n * @nocollapse\n * @deprecated Do not use. Will be removed in v8. There is no replacement for this\n * method, and there is no reason to be creating instances of `Subscriber` directly.\n * If you have a specific use case, please file an issue.\n */\n static create(next?: (x?: T) => void, error?: (e?: any) => void, complete?: () => void): Subscriber {\n return new SafeSubscriber(next, error, complete);\n }\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n protected isStopped: boolean = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n protected destination: Subscriber | Observer; // this `any` is the escape hatch to erase extra type param (e.g. R)\n\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n * There is no reason to directly create an instance of Subscriber. This type is exported for typings reasons.\n */\n constructor(destination?: Subscriber | Observer) {\n super();\n if (destination) {\n this.destination = destination;\n // Automatically chain subscriptions together here.\n // if destination is a Subscription, then it is a Subscriber.\n if (isSubscription(destination)) {\n destination.add(this);\n }\n } else {\n this.destination = EMPTY_OBSERVER;\n }\n }\n\n /**\n * The {@link Observer} callback to receive notifications of type `next` from\n * the Observable, with a value. The Observable may call this method 0 or more\n * times.\n * @param {T} [value] The `next` value.\n * @return {void}\n */\n next(value?: T): void {\n if (this.isStopped) {\n handleStoppedNotification(nextNotification(value), this);\n } else {\n this._next(value!);\n }\n }\n\n /**\n * The {@link Observer} callback to receive notifications of type `error` from\n * the Observable, with an attached `Error`. Notifies the Observer that\n * the Observable has experienced an error condition.\n * @param {any} [err] The `error` exception.\n * @return {void}\n */\n error(err?: any): void {\n if (this.isStopped) {\n handleStoppedNotification(errorNotification(err), this);\n } else {\n this.isStopped = true;\n this._error(err);\n }\n }\n\n /**\n * The {@link Observer} callback to receive a valueless notification of type\n * `complete` from the Observable. Notifies the Observer that the Observable\n * has finished sending push-based notifications.\n * @return {void}\n */\n complete(): void {\n if (this.isStopped) {\n handleStoppedNotification(COMPLETE_NOTIFICATION, this);\n } else {\n this.isStopped = true;\n this._complete();\n }\n }\n\n unsubscribe(): void {\n if (!this.closed) {\n this.isStopped = true;\n super.unsubscribe();\n this.destination = null!;\n }\n }\n\n protected _next(value: T): void {\n this.destination.next(value);\n }\n\n protected _error(err: any): void {\n try {\n this.destination.error(err);\n } finally {\n this.unsubscribe();\n }\n }\n\n protected _complete(): void {\n try {\n this.destination.complete();\n } finally {\n this.unsubscribe();\n }\n }\n}\n\n/**\n * This bind is captured here because we want to be able to have\n * compatibility with monoid libraries that tend to use a method named\n * `bind`. In particular, a library called Monio requires this.\n */\nconst _bind = Function.prototype.bind;\n\nfunction bind any>(fn: Fn, thisArg: any): Fn {\n return _bind.call(fn, thisArg);\n}\n\n/**\n * Internal optimization only, DO NOT EXPOSE.\n * @internal\n */\nclass ConsumerObserver implements Observer {\n constructor(private partialObserver: Partial>) {}\n\n next(value: T): void {\n const { partialObserver } = this;\n if (partialObserver.next) {\n try {\n partialObserver.next(value);\n } catch (error) {\n handleUnhandledError(error);\n }\n }\n }\n\n error(err: any): void {\n const { partialObserver } = this;\n if (partialObserver.error) {\n try {\n partialObserver.error(err);\n } catch (error) {\n handleUnhandledError(error);\n }\n } else {\n handleUnhandledError(err);\n }\n }\n\n complete(): void {\n const { partialObserver } = this;\n if (partialObserver.complete) {\n try {\n partialObserver.complete();\n } catch (error) {\n handleUnhandledError(error);\n }\n }\n }\n}\n\nexport class SafeSubscriber extends Subscriber {\n constructor(\n observerOrNext?: Partial> | ((value: T) => void) | null,\n error?: ((e?: any) => void) | null,\n complete?: (() => void) | null\n ) {\n super();\n\n let partialObserver: Partial>;\n if (isFunction(observerOrNext) || !observerOrNext) {\n // The first argument is a function, not an observer. The next\n // two arguments *could* be observers, or they could be empty.\n partialObserver = {\n next: (observerOrNext ?? undefined) as (((value: T) => void) | undefined),\n error: error ?? undefined,\n complete: complete ?? undefined,\n };\n } else {\n // The first argument is a partial observer.\n let context: any;\n if (this && config.useDeprecatedNextContext) {\n // This is a deprecated path that made `this.unsubscribe()` available in\n // next handler functions passed to subscribe. This only exists behind a flag\n // now, as it is *very* slow.\n context = Object.create(observerOrNext);\n context.unsubscribe = () => this.unsubscribe();\n partialObserver = {\n next: observerOrNext.next && bind(observerOrNext.next, context),\n error: observerOrNext.error && bind(observerOrNext.error, context),\n complete: observerOrNext.complete && bind(observerOrNext.complete, context),\n };\n } else {\n // The \"normal\" path. Just use the partial observer directly.\n partialObserver = observerOrNext;\n }\n }\n\n // Wrap the partial observer to ensure it's a full observer, and\n // make sure proper error handling is accounted for.\n this.destination = new ConsumerObserver(partialObserver);\n }\n}\n\nfunction handleUnhandledError(error: any) {\n if (config.useDeprecatedSynchronousErrorHandling) {\n captureError(error);\n } else {\n // Ideal path, we report this as an unhandled error,\n // which is thrown on a new call stack.\n reportUnhandledError(error);\n }\n}\n\n/**\n * An error handler used when no error handler was supplied\n * to the SafeSubscriber -- meaning no error handler was supplied\n * do the `subscribe` call on our observable.\n * @param err The error to handle\n */\nfunction defaultErrorHandler(err: any) {\n throw err;\n}\n\n/**\n * A handler for notifications that cannot be sent to a stopped subscriber.\n * @param notification The notification being sent\n * @param subscriber The stopped subscriber\n */\nfunction handleStoppedNotification(notification: ObservableNotification, subscriber: Subscriber) {\n const { onStoppedNotification } = config;\n onStoppedNotification && timeoutProvider.setTimeout(() => onStoppedNotification(notification, subscriber));\n}\n\n/**\n * The observer used as a stub for subscriptions where the user did not\n * pass any arguments to `subscribe`. Comes with the default error handling\n * behavior.\n */\nexport const EMPTY_OBSERVER: Readonly> & { closed: true } = {\n closed: true,\n next: noop,\n error: defaultErrorHandler,\n complete: noop,\n};\n", "/**\n * Symbol.observable or a string \"@@observable\". Used for interop\n *\n * @deprecated We will no longer be exporting this symbol in upcoming versions of RxJS.\n * Instead polyfill and use Symbol.observable directly *or* use https://www.npmjs.com/package/symbol-observable\n */\nexport const observable: string | symbol = (() => (typeof Symbol === 'function' && Symbol.observable) || '@@observable')();\n", "/**\n * This function takes one parameter and just returns it. Simply put,\n * this is like `(x: T): T => x`.\n *\n * ## Examples\n *\n * This is useful in some cases when using things like `mergeMap`\n *\n * ```ts\n * import { interval, take, map, range, mergeMap, identity } from 'rxjs';\n *\n * const source$ = interval(1000).pipe(take(5));\n *\n * const result$ = source$.pipe(\n * map(i => range(i)),\n * mergeMap(identity) // same as mergeMap(x => x)\n * );\n *\n * result$.subscribe({\n * next: console.log\n * });\n * ```\n *\n * Or when you want to selectively apply an operator\n *\n * ```ts\n * import { interval, take, identity } from 'rxjs';\n *\n * const shouldLimit = () => Math.random() < 0.5;\n *\n * const source$ = interval(1000);\n *\n * const result$ = source$.pipe(shouldLimit() ? take(5) : identity);\n *\n * result$.subscribe({\n * next: console.log\n * });\n * ```\n *\n * @param x Any value that is returned by this function\n * @returns The value passed as the first parameter to this function\n */\nexport function identity(x: T): T {\n return x;\n}\n", "import { identity } from './identity';\nimport { UnaryFunction } from '../types';\n\nexport function pipe(): typeof identity;\nexport function pipe(fn1: UnaryFunction): UnaryFunction;\nexport function pipe(fn1: UnaryFunction, fn2: UnaryFunction): UnaryFunction;\nexport function pipe(fn1: UnaryFunction, fn2: UnaryFunction, fn3: UnaryFunction): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction,\n fn9: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction,\n fn9: UnaryFunction,\n ...fns: UnaryFunction[]\n): UnaryFunction;\n\n/**\n * pipe() can be called on one or more functions, each of which can take one argument (\"UnaryFunction\")\n * and uses it to return a value.\n * It returns a function that takes one argument, passes it to the first UnaryFunction, and then\n * passes the result to the next one, passes that result to the next one, and so on. \n */\nexport function pipe(...fns: Array>): UnaryFunction {\n return pipeFromArray(fns);\n}\n\n/** @internal */\nexport function pipeFromArray(fns: Array>): UnaryFunction {\n if (fns.length === 0) {\n return identity as UnaryFunction;\n }\n\n if (fns.length === 1) {\n return fns[0];\n }\n\n return function piped(input: T): R {\n return fns.reduce((prev: any, fn: UnaryFunction) => fn(prev), input as any);\n };\n}\n", "import { Operator } from './Operator';\nimport { SafeSubscriber, Subscriber } from './Subscriber';\nimport { isSubscription, Subscription } from './Subscription';\nimport { TeardownLogic, OperatorFunction, Subscribable, Observer } from './types';\nimport { observable as Symbol_observable } from './symbol/observable';\nimport { pipeFromArray } from './util/pipe';\nimport { config } from './config';\nimport { isFunction } from './util/isFunction';\nimport { errorContext } from './util/errorContext';\n\n/**\n * A representation of any set of values over any amount of time. This is the most basic building block\n * of RxJS.\n *\n * @class Observable\n */\nexport class Observable implements Subscribable {\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n */\n source: Observable | undefined;\n\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n */\n operator: Operator | undefined;\n\n /**\n * @constructor\n * @param {Function} subscribe the function that is called when the Observable is\n * initially subscribed to. This function is given a Subscriber, to which new values\n * can be `next`ed, or an `error` method can be called to raise an error, or\n * `complete` can be called to notify of a successful completion.\n */\n constructor(subscribe?: (this: Observable, subscriber: Subscriber) => TeardownLogic) {\n if (subscribe) {\n this._subscribe = subscribe;\n }\n }\n\n // HACK: Since TypeScript inherits static properties too, we have to\n // fight against TypeScript here so Subject can have a different static create signature\n /**\n * Creates a new Observable by calling the Observable constructor\n * @owner Observable\n * @method create\n * @param {Function} subscribe? the subscriber function to be passed to the Observable constructor\n * @return {Observable} a new observable\n * @nocollapse\n * @deprecated Use `new Observable()` instead. Will be removed in v8.\n */\n static create: (...args: any[]) => any = (subscribe?: (subscriber: Subscriber) => TeardownLogic) => {\n return new Observable(subscribe);\n };\n\n /**\n * Creates a new Observable, with this Observable instance as the source, and the passed\n * operator defined as the new observable's operator.\n * @method lift\n * @param operator the operator defining the operation to take on the observable\n * @return a new observable with the Operator applied\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n * If you have implemented an operator using `lift`, it is recommended that you create an\n * operator by simply returning `new Observable()` directly. See \"Creating new operators from\n * scratch\" section here: https://rxjs.dev/guide/operators\n */\n lift(operator?: Operator): Observable {\n const observable = new Observable();\n observable.source = this;\n observable.operator = operator;\n return observable;\n }\n\n subscribe(observerOrNext?: Partial> | ((value: T) => void)): Subscription;\n /** @deprecated Instead of passing separate callback arguments, use an observer argument. Signatures taking separate callback arguments will be removed in v8. Details: https://rxjs.dev/deprecations/subscribe-arguments */\n subscribe(next?: ((value: T) => void) | null, error?: ((error: any) => void) | null, complete?: (() => void) | null): Subscription;\n /**\n * Invokes an execution of an Observable and registers Observer handlers for notifications it will emit.\n *\n * Use it when you have all these Observables, but still nothing is happening.\n *\n * `subscribe` is not a regular operator, but a method that calls Observable's internal `subscribe` function. It\n * might be for example a function that you passed to Observable's constructor, but most of the time it is\n * a library implementation, which defines what will be emitted by an Observable, and when it be will emitted. This means\n * that calling `subscribe` is actually the moment when Observable starts its work, not when it is created, as it is often\n * the thought.\n *\n * Apart from starting the execution of an Observable, this method allows you to listen for values\n * that an Observable emits, as well as for when it completes or errors. You can achieve this in two\n * of the following ways.\n *\n * The first way is creating an object that implements {@link Observer} interface. It should have methods\n * defined by that interface, but note that it should be just a regular JavaScript object, which you can create\n * yourself in any way you want (ES6 class, classic function constructor, object literal etc.). In particular, do\n * not attempt to use any RxJS implementation details to create Observers - you don't need them. Remember also\n * that your object does not have to implement all methods. If you find yourself creating a method that doesn't\n * do anything, you can simply omit it. Note however, if the `error` method is not provided and an error happens,\n * it will be thrown asynchronously. Errors thrown asynchronously cannot be caught using `try`/`catch`. Instead,\n * use the {@link onUnhandledError} configuration option or use a runtime handler (like `window.onerror` or\n * `process.on('error)`) to be notified of unhandled errors. Because of this, it's recommended that you provide\n * an `error` method to avoid missing thrown errors.\n *\n * The second way is to give up on Observer object altogether and simply provide callback functions in place of its methods.\n * This means you can provide three functions as arguments to `subscribe`, where the first function is equivalent\n * of a `next` method, the second of an `error` method and the third of a `complete` method. Just as in case of an Observer,\n * if you do not need to listen for something, you can omit a function by passing `undefined` or `null`,\n * since `subscribe` recognizes these functions by where they were placed in function call. When it comes\n * to the `error` function, as with an Observer, if not provided, errors emitted by an Observable will be thrown asynchronously.\n *\n * You can, however, subscribe with no parameters at all. This may be the case where you're not interested in terminal events\n * and you also handled emissions internally by using operators (e.g. using `tap`).\n *\n * Whichever style of calling `subscribe` you use, in both cases it returns a Subscription object.\n * This object allows you to call `unsubscribe` on it, which in turn will stop the work that an Observable does and will clean\n * up all resources that an Observable used. Note that cancelling a subscription will not call `complete` callback\n * provided to `subscribe` function, which is reserved for a regular completion signal that comes from an Observable.\n *\n * Remember that callbacks provided to `subscribe` are not guaranteed to be called asynchronously.\n * It is an Observable itself that decides when these functions will be called. For example {@link of}\n * by default emits all its values synchronously. Always check documentation for how given Observable\n * will behave when subscribed and if its default behavior can be modified with a `scheduler`.\n *\n * #### Examples\n *\n * Subscribe with an {@link guide/observer Observer}\n *\n * ```ts\n * import { of } from 'rxjs';\n *\n * const sumObserver = {\n * sum: 0,\n * next(value) {\n * console.log('Adding: ' + value);\n * this.sum = this.sum + value;\n * },\n * error() {\n * // We actually could just remove this method,\n * // since we do not really care about errors right now.\n * },\n * complete() {\n * console.log('Sum equals: ' + this.sum);\n * }\n * };\n *\n * of(1, 2, 3) // Synchronously emits 1, 2, 3 and then completes.\n * .subscribe(sumObserver);\n *\n * // Logs:\n * // 'Adding: 1'\n * // 'Adding: 2'\n * // 'Adding: 3'\n * // 'Sum equals: 6'\n * ```\n *\n * Subscribe with functions ({@link deprecations/subscribe-arguments deprecated})\n *\n * ```ts\n * import { of } from 'rxjs'\n *\n * let sum = 0;\n *\n * of(1, 2, 3).subscribe(\n * value => {\n * console.log('Adding: ' + value);\n * sum = sum + value;\n * },\n * undefined,\n * () => console.log('Sum equals: ' + sum)\n * );\n *\n * // Logs:\n * // 'Adding: 1'\n * // 'Adding: 2'\n * // 'Adding: 3'\n * // 'Sum equals: 6'\n * ```\n *\n * Cancel a subscription\n *\n * ```ts\n * import { interval } from 'rxjs';\n *\n * const subscription = interval(1000).subscribe({\n * next(num) {\n * console.log(num)\n * },\n * complete() {\n * // Will not be called, even when cancelling subscription.\n * console.log('completed!');\n * }\n * });\n *\n * setTimeout(() => {\n * subscription.unsubscribe();\n * console.log('unsubscribed!');\n * }, 2500);\n *\n * // Logs:\n * // 0 after 1s\n * // 1 after 2s\n * // 'unsubscribed!' after 2.5s\n * ```\n *\n * @param {Observer|Function} observerOrNext (optional) Either an observer with methods to be called,\n * or the first of three possible handlers, which is the handler for each value emitted from the subscribed\n * Observable.\n * @param {Function} error (optional) A handler for a terminal event resulting from an error. If no error handler is provided,\n * the error will be thrown asynchronously as unhandled.\n * @param {Function} complete (optional) A handler for a terminal event resulting from successful completion.\n * @return {Subscription} a subscription reference to the registered handlers\n * @method subscribe\n */\n subscribe(\n observerOrNext?: Partial> | ((value: T) => void) | null,\n error?: ((error: any) => void) | null,\n complete?: (() => void) | null\n ): Subscription {\n const subscriber = isSubscriber(observerOrNext) ? observerOrNext : new SafeSubscriber(observerOrNext, error, complete);\n\n errorContext(() => {\n const { operator, source } = this;\n subscriber.add(\n operator\n ? // We're dealing with a subscription in the\n // operator chain to one of our lifted operators.\n operator.call(subscriber, source)\n : source\n ? // If `source` has a value, but `operator` does not, something that\n // had intimate knowledge of our API, like our `Subject`, must have\n // set it. We're going to just call `_subscribe` directly.\n this._subscribe(subscriber)\n : // In all other cases, we're likely wrapping a user-provided initializer\n // function, so we need to catch errors and handle them appropriately.\n this._trySubscribe(subscriber)\n );\n });\n\n return subscriber;\n }\n\n /** @internal */\n protected _trySubscribe(sink: Subscriber): TeardownLogic {\n try {\n return this._subscribe(sink);\n } catch (err) {\n // We don't need to return anything in this case,\n // because it's just going to try to `add()` to a subscription\n // above.\n sink.error(err);\n }\n }\n\n /**\n * Used as a NON-CANCELLABLE means of subscribing to an observable, for use with\n * APIs that expect promises, like `async/await`. You cannot unsubscribe from this.\n *\n * **WARNING**: Only use this with observables you *know* will complete. If the source\n * observable does not complete, you will end up with a promise that is hung up, and\n * potentially all of the state of an async function hanging out in memory. To avoid\n * this situation, look into adding something like {@link timeout}, {@link take},\n * {@link takeWhile}, or {@link takeUntil} amongst others.\n *\n * #### Example\n *\n * ```ts\n * import { interval, take } from 'rxjs';\n *\n * const source$ = interval(1000).pipe(take(4));\n *\n * async function getTotal() {\n * let total = 0;\n *\n * await source$.forEach(value => {\n * total += value;\n * console.log('observable -> ' + value);\n * });\n *\n * return total;\n * }\n *\n * getTotal().then(\n * total => console.log('Total: ' + total)\n * );\n *\n * // Expected:\n * // 'observable -> 0'\n * // 'observable -> 1'\n * // 'observable -> 2'\n * // 'observable -> 3'\n * // 'Total: 6'\n * ```\n *\n * @param next a handler for each value emitted by the observable\n * @return a promise that either resolves on observable completion or\n * rejects with the handled error\n */\n forEach(next: (value: T) => void): Promise;\n\n /**\n * @param next a handler for each value emitted by the observable\n * @param promiseCtor a constructor function used to instantiate the Promise\n * @return a promise that either resolves on observable completion or\n * rejects with the handled error\n * @deprecated Passing a Promise constructor will no longer be available\n * in upcoming versions of RxJS. This is because it adds weight to the library, for very\n * little benefit. If you need this functionality, it is recommended that you either\n * polyfill Promise, or you create an adapter to convert the returned native promise\n * to whatever promise implementation you wanted. Will be removed in v8.\n */\n forEach(next: (value: T) => void, promiseCtor: PromiseConstructorLike): Promise;\n\n forEach(next: (value: T) => void, promiseCtor?: PromiseConstructorLike): Promise {\n promiseCtor = getPromiseCtor(promiseCtor);\n\n return new promiseCtor((resolve, reject) => {\n const subscriber = new SafeSubscriber({\n next: (value) => {\n try {\n next(value);\n } catch (err) {\n reject(err);\n subscriber.unsubscribe();\n }\n },\n error: reject,\n complete: resolve,\n });\n this.subscribe(subscriber);\n }) as Promise;\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): TeardownLogic {\n return this.source?.subscribe(subscriber);\n }\n\n /**\n * An interop point defined by the es7-observable spec https://github.com/zenparsing/es-observable\n * @method Symbol.observable\n * @return {Observable} this instance of the observable\n */\n [Symbol_observable]() {\n return this;\n }\n\n /* tslint:disable:max-line-length */\n pipe(): Observable;\n pipe(op1: OperatorFunction): Observable;\n pipe(op1: OperatorFunction, op2: OperatorFunction): Observable;\n pipe(op1: OperatorFunction, op2: OperatorFunction, op3: OperatorFunction): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction,\n op9: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction,\n op9: OperatorFunction,\n ...operations: OperatorFunction[]\n ): Observable;\n /* tslint:enable:max-line-length */\n\n /**\n * Used to stitch together functional operators into a chain.\n * @method pipe\n * @return {Observable} the Observable result of all of the operators having\n * been called in the order they were passed in.\n *\n * ## Example\n *\n * ```ts\n * import { interval, filter, map, scan } from 'rxjs';\n *\n * interval(1000)\n * .pipe(\n * filter(x => x % 2 === 0),\n * map(x => x + x),\n * scan((acc, x) => acc + x)\n * )\n * .subscribe(x => console.log(x));\n * ```\n */\n pipe(...operations: OperatorFunction[]): Observable {\n return pipeFromArray(operations)(this);\n }\n\n /* tslint:disable:max-line-length */\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(): Promise;\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(PromiseCtor: typeof Promise): Promise;\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(PromiseCtor: PromiseConstructorLike): Promise;\n /* tslint:enable:max-line-length */\n\n /**\n * Subscribe to this Observable and get a Promise resolving on\n * `complete` with the last emission (if any).\n *\n * **WARNING**: Only use this with observables you *know* will complete. If the source\n * observable does not complete, you will end up with a promise that is hung up, and\n * potentially all of the state of an async function hanging out in memory. To avoid\n * this situation, look into adding something like {@link timeout}, {@link take},\n * {@link takeWhile}, or {@link takeUntil} amongst others.\n *\n * @method toPromise\n * @param [promiseCtor] a constructor function used to instantiate\n * the Promise\n * @return A Promise that resolves with the last value emit, or\n * rejects on an error. If there were no emissions, Promise\n * resolves with undefined.\n * @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise\n */\n toPromise(promiseCtor?: PromiseConstructorLike): Promise {\n promiseCtor = getPromiseCtor(promiseCtor);\n\n return new promiseCtor((resolve, reject) => {\n let value: T | undefined;\n this.subscribe(\n (x: T) => (value = x),\n (err: any) => reject(err),\n () => resolve(value)\n );\n }) as Promise;\n }\n}\n\n/**\n * Decides between a passed promise constructor from consuming code,\n * A default configured promise constructor, and the native promise\n * constructor and returns it. If nothing can be found, it will throw\n * an error.\n * @param promiseCtor The optional promise constructor to passed by consuming code\n */\nfunction getPromiseCtor(promiseCtor: PromiseConstructorLike | undefined) {\n return promiseCtor ?? config.Promise ?? Promise;\n}\n\nfunction isObserver(value: any): value is Observer {\n return value && isFunction(value.next) && isFunction(value.error) && isFunction(value.complete);\n}\n\nfunction isSubscriber(value: any): value is Subscriber {\n return (value && value instanceof Subscriber) || (isObserver(value) && isSubscription(value));\n}\n", "import { Observable } from '../Observable';\nimport { Subscriber } from '../Subscriber';\nimport { OperatorFunction } from '../types';\nimport { isFunction } from './isFunction';\n\n/**\n * Used to determine if an object is an Observable with a lift function.\n */\nexport function hasLift(source: any): source is { lift: InstanceType['lift'] } {\n return isFunction(source?.lift);\n}\n\n/**\n * Creates an `OperatorFunction`. Used to define operators throughout the library in a concise way.\n * @param init The logic to connect the liftedSource to the subscriber at the moment of subscription.\n */\nexport function operate(\n init: (liftedSource: Observable, subscriber: Subscriber) => (() => void) | void\n): OperatorFunction {\n return (source: Observable) => {\n if (hasLift(source)) {\n return source.lift(function (this: Subscriber, liftedSource: Observable) {\n try {\n return init(liftedSource, this);\n } catch (err) {\n this.error(err);\n }\n });\n }\n throw new TypeError('Unable to lift unknown Observable type');\n };\n}\n", "import { Subscriber } from '../Subscriber';\n\n/**\n * Creates an instance of an `OperatorSubscriber`.\n * @param destination The downstream subscriber.\n * @param onNext Handles next values, only called if this subscriber is not stopped or closed. Any\n * error that occurs in this function is caught and sent to the `error` method of this subscriber.\n * @param onError Handles errors from the subscription, any errors that occur in this handler are caught\n * and send to the `destination` error handler.\n * @param onComplete Handles completion notification from the subscription. Any errors that occur in\n * this handler are sent to the `destination` error handler.\n * @param onFinalize Additional teardown logic here. This will only be called on teardown if the\n * subscriber itself is not already closed. This is called after all other teardown logic is executed.\n */\nexport function createOperatorSubscriber(\n destination: Subscriber,\n onNext?: (value: T) => void,\n onComplete?: () => void,\n onError?: (err: any) => void,\n onFinalize?: () => void\n): Subscriber {\n return new OperatorSubscriber(destination, onNext, onComplete, onError, onFinalize);\n}\n\n/**\n * A generic helper for allowing operators to be created with a Subscriber and\n * use closures to capture necessary state from the operator function itself.\n */\nexport class OperatorSubscriber extends Subscriber {\n /**\n * Creates an instance of an `OperatorSubscriber`.\n * @param destination The downstream subscriber.\n * @param onNext Handles next values, only called if this subscriber is not stopped or closed. Any\n * error that occurs in this function is caught and sent to the `error` method of this subscriber.\n * @param onError Handles errors from the subscription, any errors that occur in this handler are caught\n * and send to the `destination` error handler.\n * @param onComplete Handles completion notification from the subscription. Any errors that occur in\n * this handler are sent to the `destination` error handler.\n * @param onFinalize Additional finalization logic here. This will only be called on finalization if the\n * subscriber itself is not already closed. This is called after all other finalization logic is executed.\n * @param shouldUnsubscribe An optional check to see if an unsubscribe call should truly unsubscribe.\n * NOTE: This currently **ONLY** exists to support the strange behavior of {@link groupBy}, where unsubscription\n * to the resulting observable does not actually disconnect from the source if there are active subscriptions\n * to any grouped observable. (DO NOT EXPOSE OR USE EXTERNALLY!!!)\n */\n constructor(\n destination: Subscriber,\n onNext?: (value: T) => void,\n onComplete?: () => void,\n onError?: (err: any) => void,\n private onFinalize?: () => void,\n private shouldUnsubscribe?: () => boolean\n ) {\n // It's important - for performance reasons - that all of this class's\n // members are initialized and that they are always initialized in the same\n // order. This will ensure that all OperatorSubscriber instances have the\n // same hidden class in V8. This, in turn, will help keep the number of\n // hidden classes involved in property accesses within the base class as\n // low as possible. If the number of hidden classes involved exceeds four,\n // the property accesses will become megamorphic and performance penalties\n // will be incurred - i.e. inline caches won't be used.\n //\n // The reasons for ensuring all instances have the same hidden class are\n // further discussed in this blog post from Benedikt Meurer:\n // https://benediktmeurer.de/2018/03/23/impact-of-polymorphism-on-component-based-frameworks-like-react/\n super(destination);\n this._next = onNext\n ? function (this: OperatorSubscriber, value: T) {\n try {\n onNext(value);\n } catch (err) {\n destination.error(err);\n }\n }\n : super._next;\n this._error = onError\n ? function (this: OperatorSubscriber, err: any) {\n try {\n onError(err);\n } catch (err) {\n // Send any errors that occur down stream.\n destination.error(err);\n } finally {\n // Ensure finalization.\n this.unsubscribe();\n }\n }\n : super._error;\n this._complete = onComplete\n ? function (this: OperatorSubscriber) {\n try {\n onComplete();\n } catch (err) {\n // Send any errors that occur down stream.\n destination.error(err);\n } finally {\n // Ensure finalization.\n this.unsubscribe();\n }\n }\n : super._complete;\n }\n\n unsubscribe() {\n if (!this.shouldUnsubscribe || this.shouldUnsubscribe()) {\n const { closed } = this;\n super.unsubscribe();\n // Execute additional teardown if we have any and we didn't already do so.\n !closed && this.onFinalize?.();\n }\n }\n}\n", "import { Subscription } from '../Subscription';\n\ninterface AnimationFrameProvider {\n schedule(callback: FrameRequestCallback): Subscription;\n requestAnimationFrame: typeof requestAnimationFrame;\n cancelAnimationFrame: typeof cancelAnimationFrame;\n delegate:\n | {\n requestAnimationFrame: typeof requestAnimationFrame;\n cancelAnimationFrame: typeof cancelAnimationFrame;\n }\n | undefined;\n}\n\nexport const animationFrameProvider: AnimationFrameProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n schedule(callback) {\n let request = requestAnimationFrame;\n let cancel: typeof cancelAnimationFrame | undefined = cancelAnimationFrame;\n const { delegate } = animationFrameProvider;\n if (delegate) {\n request = delegate.requestAnimationFrame;\n cancel = delegate.cancelAnimationFrame;\n }\n const handle = request((timestamp) => {\n // Clear the cancel function. The request has been fulfilled, so\n // attempting to cancel the request upon unsubscription would be\n // pointless.\n cancel = undefined;\n callback(timestamp);\n });\n return new Subscription(() => cancel?.(handle));\n },\n requestAnimationFrame(...args) {\n const { delegate } = animationFrameProvider;\n return (delegate?.requestAnimationFrame || requestAnimationFrame)(...args);\n },\n cancelAnimationFrame(...args) {\n const { delegate } = animationFrameProvider;\n return (delegate?.cancelAnimationFrame || cancelAnimationFrame)(...args);\n },\n delegate: undefined,\n};\n", "import { createErrorClass } from './createErrorClass';\n\nexport interface ObjectUnsubscribedError extends Error {}\n\nexport interface ObjectUnsubscribedErrorCtor {\n /**\n * @deprecated Internal implementation detail. Do not construct error instances.\n * Cannot be tagged as internal: https://github.com/ReactiveX/rxjs/issues/6269\n */\n new (): ObjectUnsubscribedError;\n}\n\n/**\n * An error thrown when an action is invalid because the object has been\n * unsubscribed.\n *\n * @see {@link Subject}\n * @see {@link BehaviorSubject}\n *\n * @class ObjectUnsubscribedError\n */\nexport const ObjectUnsubscribedError: ObjectUnsubscribedErrorCtor = createErrorClass(\n (_super) =>\n function ObjectUnsubscribedErrorImpl(this: any) {\n _super(this);\n this.name = 'ObjectUnsubscribedError';\n this.message = 'object unsubscribed';\n }\n);\n", "import { Operator } from './Operator';\nimport { Observable } from './Observable';\nimport { Subscriber } from './Subscriber';\nimport { Subscription, EMPTY_SUBSCRIPTION } from './Subscription';\nimport { Observer, SubscriptionLike, TeardownLogic } from './types';\nimport { ObjectUnsubscribedError } from './util/ObjectUnsubscribedError';\nimport { arrRemove } from './util/arrRemove';\nimport { errorContext } from './util/errorContext';\n\n/**\n * A Subject is a special type of Observable that allows values to be\n * multicasted to many Observers. Subjects are like EventEmitters.\n *\n * Every Subject is an Observable and an Observer. You can subscribe to a\n * Subject, and you can call next to feed values as well as error and complete.\n */\nexport class Subject extends Observable implements SubscriptionLike {\n closed = false;\n\n private currentObservers: Observer[] | null = null;\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n observers: Observer[] = [];\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n isStopped = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n hasError = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n thrownError: any = null;\n\n /**\n * Creates a \"subject\" by basically gluing an observer to an observable.\n *\n * @nocollapse\n * @deprecated Recommended you do not use. Will be removed at some point in the future. Plans for replacement still under discussion.\n */\n static create: (...args: any[]) => any = (destination: Observer, source: Observable): AnonymousSubject => {\n return new AnonymousSubject(destination, source);\n };\n\n constructor() {\n // NOTE: This must be here to obscure Observable's constructor.\n super();\n }\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n lift(operator: Operator): Observable {\n const subject = new AnonymousSubject(this, this);\n subject.operator = operator as any;\n return subject as any;\n }\n\n /** @internal */\n protected _throwIfClosed() {\n if (this.closed) {\n throw new ObjectUnsubscribedError();\n }\n }\n\n next(value: T) {\n errorContext(() => {\n this._throwIfClosed();\n if (!this.isStopped) {\n if (!this.currentObservers) {\n this.currentObservers = Array.from(this.observers);\n }\n for (const observer of this.currentObservers) {\n observer.next(value);\n }\n }\n });\n }\n\n error(err: any) {\n errorContext(() => {\n this._throwIfClosed();\n if (!this.isStopped) {\n this.hasError = this.isStopped = true;\n this.thrownError = err;\n const { observers } = this;\n while (observers.length) {\n observers.shift()!.error(err);\n }\n }\n });\n }\n\n complete() {\n errorContext(() => {\n this._throwIfClosed();\n if (!this.isStopped) {\n this.isStopped = true;\n const { observers } = this;\n while (observers.length) {\n observers.shift()!.complete();\n }\n }\n });\n }\n\n unsubscribe() {\n this.isStopped = this.closed = true;\n this.observers = this.currentObservers = null!;\n }\n\n get observed() {\n return this.observers?.length > 0;\n }\n\n /** @internal */\n protected _trySubscribe(subscriber: Subscriber): TeardownLogic {\n this._throwIfClosed();\n return super._trySubscribe(subscriber);\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n this._throwIfClosed();\n this._checkFinalizedStatuses(subscriber);\n return this._innerSubscribe(subscriber);\n }\n\n /** @internal */\n protected _innerSubscribe(subscriber: Subscriber) {\n const { hasError, isStopped, observers } = this;\n if (hasError || isStopped) {\n return EMPTY_SUBSCRIPTION;\n }\n this.currentObservers = null;\n observers.push(subscriber);\n return new Subscription(() => {\n this.currentObservers = null;\n arrRemove(observers, subscriber);\n });\n }\n\n /** @internal */\n protected _checkFinalizedStatuses(subscriber: Subscriber) {\n const { hasError, thrownError, isStopped } = this;\n if (hasError) {\n subscriber.error(thrownError);\n } else if (isStopped) {\n subscriber.complete();\n }\n }\n\n /**\n * Creates a new Observable with this Subject as the source. You can do this\n * to create custom Observer-side logic of the Subject and conceal it from\n * code that uses the Observable.\n * @return {Observable} Observable that the Subject casts to\n */\n asObservable(): Observable {\n const observable: any = new Observable();\n observable.source = this;\n return observable;\n }\n}\n\n/**\n * @class AnonymousSubject\n */\nexport class AnonymousSubject extends Subject {\n constructor(\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n public destination?: Observer,\n source?: Observable\n ) {\n super();\n this.source = source;\n }\n\n next(value: T) {\n this.destination?.next?.(value);\n }\n\n error(err: any) {\n this.destination?.error?.(err);\n }\n\n complete() {\n this.destination?.complete?.();\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n return this.source?.subscribe(subscriber) ?? EMPTY_SUBSCRIPTION;\n }\n}\n", "import { TimestampProvider } from '../types';\n\ninterface DateTimestampProvider extends TimestampProvider {\n delegate: TimestampProvider | undefined;\n}\n\nexport const dateTimestampProvider: DateTimestampProvider = {\n now() {\n // Use the variable rather than `this` so that the function can be called\n // without being bound to the provider.\n return (dateTimestampProvider.delegate || Date).now();\n },\n delegate: undefined,\n};\n", "import { Subject } from './Subject';\nimport { TimestampProvider } from './types';\nimport { Subscriber } from './Subscriber';\nimport { Subscription } from './Subscription';\nimport { dateTimestampProvider } from './scheduler/dateTimestampProvider';\n\n/**\n * A variant of {@link Subject} that \"replays\" old values to new subscribers by emitting them when they first subscribe.\n *\n * `ReplaySubject` has an internal buffer that will store a specified number of values that it has observed. Like `Subject`,\n * `ReplaySubject` \"observes\" values by having them passed to its `next` method. When it observes a value, it will store that\n * value for a time determined by the configuration of the `ReplaySubject`, as passed to its constructor.\n *\n * When a new subscriber subscribes to the `ReplaySubject` instance, it will synchronously emit all values in its buffer in\n * a First-In-First-Out (FIFO) manner. The `ReplaySubject` will also complete, if it has observed completion; and it will\n * error if it has observed an error.\n *\n * There are two main configuration items to be concerned with:\n *\n * 1. `bufferSize` - This will determine how many items are stored in the buffer, defaults to infinite.\n * 2. `windowTime` - The amount of time to hold a value in the buffer before removing it from the buffer.\n *\n * Both configurations may exist simultaneously. So if you would like to buffer a maximum of 3 values, as long as the values\n * are less than 2 seconds old, you could do so with a `new ReplaySubject(3, 2000)`.\n *\n * ### Differences with BehaviorSubject\n *\n * `BehaviorSubject` is similar to `new ReplaySubject(1)`, with a couple of exceptions:\n *\n * 1. `BehaviorSubject` comes \"primed\" with a single value upon construction.\n * 2. `ReplaySubject` will replay values, even after observing an error, where `BehaviorSubject` will not.\n *\n * @see {@link Subject}\n * @see {@link BehaviorSubject}\n * @see {@link shareReplay}\n */\nexport class ReplaySubject extends Subject {\n private _buffer: (T | number)[] = [];\n private _infiniteTimeWindow = true;\n\n /**\n * @param bufferSize The size of the buffer to replay on subscription\n * @param windowTime The amount of time the buffered items will stay buffered\n * @param timestampProvider An object with a `now()` method that provides the current timestamp. This is used to\n * calculate the amount of time something has been buffered.\n */\n constructor(\n private _bufferSize = Infinity,\n private _windowTime = Infinity,\n private _timestampProvider: TimestampProvider = dateTimestampProvider\n ) {\n super();\n this._infiniteTimeWindow = _windowTime === Infinity;\n this._bufferSize = Math.max(1, _bufferSize);\n this._windowTime = Math.max(1, _windowTime);\n }\n\n next(value: T): void {\n const { isStopped, _buffer, _infiniteTimeWindow, _timestampProvider, _windowTime } = this;\n if (!isStopped) {\n _buffer.push(value);\n !_infiniteTimeWindow && _buffer.push(_timestampProvider.now() + _windowTime);\n }\n this._trimBuffer();\n super.next(value);\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n this._throwIfClosed();\n this._trimBuffer();\n\n const subscription = this._innerSubscribe(subscriber);\n\n const { _infiniteTimeWindow, _buffer } = this;\n // We use a copy here, so reentrant code does not mutate our array while we're\n // emitting it to a new subscriber.\n const copy = _buffer.slice();\n for (let i = 0; i < copy.length && !subscriber.closed; i += _infiniteTimeWindow ? 1 : 2) {\n subscriber.next(copy[i] as T);\n }\n\n this._checkFinalizedStatuses(subscriber);\n\n return subscription;\n }\n\n private _trimBuffer() {\n const { _bufferSize, _timestampProvider, _buffer, _infiniteTimeWindow } = this;\n // If we don't have an infinite buffer size, and we're over the length,\n // use splice to truncate the old buffer values off. Note that we have to\n // double the size for instances where we're not using an infinite time window\n // because we're storing the values and the timestamps in the same array.\n const adjustedBufferSize = (_infiniteTimeWindow ? 1 : 2) * _bufferSize;\n _bufferSize < Infinity && adjustedBufferSize < _buffer.length && _buffer.splice(0, _buffer.length - adjustedBufferSize);\n\n // Now, if we're not in an infinite time window, remove all values where the time is\n // older than what is allowed.\n if (!_infiniteTimeWindow) {\n const now = _timestampProvider.now();\n let last = 0;\n // Search the array for the first timestamp that isn't expired and\n // truncate the buffer up to that point.\n for (let i = 1; i < _buffer.length && (_buffer[i] as number) <= now; i += 2) {\n last = i;\n }\n last && _buffer.splice(0, last + 1);\n }\n }\n}\n", "import { Scheduler } from '../Scheduler';\nimport { Subscription } from '../Subscription';\nimport { SchedulerAction } from '../types';\n\n/**\n * A unit of work to be executed in a `scheduler`. An action is typically\n * created from within a {@link SchedulerLike} and an RxJS user does not need to concern\n * themselves about creating and manipulating an Action.\n *\n * ```ts\n * class Action extends Subscription {\n * new (scheduler: Scheduler, work: (state?: T) => void);\n * schedule(state?: T, delay: number = 0): Subscription;\n * }\n * ```\n *\n * @class Action\n */\nexport class Action extends Subscription {\n constructor(scheduler: Scheduler, work: (this: SchedulerAction, state?: T) => void) {\n super();\n }\n /**\n * Schedules this action on its parent {@link SchedulerLike} for execution. May be passed\n * some context object, `state`. May happen at some point in the future,\n * according to the `delay` parameter, if specified.\n * @param {T} [state] Some contextual data that the `work` function uses when\n * called by the Scheduler.\n * @param {number} [delay] Time to wait before executing the work, where the\n * time unit is implicit and defined by the Scheduler.\n * @return {void}\n */\n public schedule(state?: T, delay: number = 0): Subscription {\n return this;\n }\n}\n", "import type { TimerHandle } from './timerHandle';\ntype SetIntervalFunction = (handler: () => void, timeout?: number, ...args: any[]) => TimerHandle;\ntype ClearIntervalFunction = (handle: TimerHandle) => void;\n\ninterface IntervalProvider {\n setInterval: SetIntervalFunction;\n clearInterval: ClearIntervalFunction;\n delegate:\n | {\n setInterval: SetIntervalFunction;\n clearInterval: ClearIntervalFunction;\n }\n | undefined;\n}\n\nexport const intervalProvider: IntervalProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n setInterval(handler: () => void, timeout?: number, ...args) {\n const { delegate } = intervalProvider;\n if (delegate?.setInterval) {\n return delegate.setInterval(handler, timeout, ...args);\n }\n return setInterval(handler, timeout, ...args);\n },\n clearInterval(handle) {\n const { delegate } = intervalProvider;\n return (delegate?.clearInterval || clearInterval)(handle as any);\n },\n delegate: undefined,\n};\n", "import { Action } from './Action';\nimport { SchedulerAction } from '../types';\nimport { Subscription } from '../Subscription';\nimport { AsyncScheduler } from './AsyncScheduler';\nimport { intervalProvider } from './intervalProvider';\nimport { arrRemove } from '../util/arrRemove';\nimport { TimerHandle } from './timerHandle';\n\nexport class AsyncAction extends Action {\n public id: TimerHandle | undefined;\n public state?: T;\n // @ts-ignore: Property has no initializer and is not definitely assigned\n public delay: number;\n protected pending: boolean = false;\n\n constructor(protected scheduler: AsyncScheduler, protected work: (this: SchedulerAction, state?: T) => void) {\n super(scheduler, work);\n }\n\n public schedule(state?: T, delay: number = 0): Subscription {\n if (this.closed) {\n return this;\n }\n\n // Always replace the current state with the new state.\n this.state = state;\n\n const id = this.id;\n const scheduler = this.scheduler;\n\n //\n // Important implementation note:\n //\n // Actions only execute once by default, unless rescheduled from within the\n // scheduled callback. This allows us to implement single and repeat\n // actions via the same code path, without adding API surface area, as well\n // as mimic traditional recursion but across asynchronous boundaries.\n //\n // However, JS runtimes and timers distinguish between intervals achieved by\n // serial `setTimeout` calls vs. a single `setInterval` call. An interval of\n // serial `setTimeout` calls can be individually delayed, which delays\n // scheduling the next `setTimeout`, and so on. `setInterval` attempts to\n // guarantee the interval callback will be invoked more precisely to the\n // interval period, regardless of load.\n //\n // Therefore, we use `setInterval` to schedule single and repeat actions.\n // If the action reschedules itself with the same delay, the interval is not\n // canceled. If the action doesn't reschedule, or reschedules with a\n // different delay, the interval will be canceled after scheduled callback\n // execution.\n //\n if (id != null) {\n this.id = this.recycleAsyncId(scheduler, id, delay);\n }\n\n // Set the pending flag indicating that this action has been scheduled, or\n // has recursively rescheduled itself.\n this.pending = true;\n\n this.delay = delay;\n // If this action has already an async Id, don't request a new one.\n this.id = this.id ?? this.requestAsyncId(scheduler, this.id, delay);\n\n return this;\n }\n\n protected requestAsyncId(scheduler: AsyncScheduler, _id?: TimerHandle, delay: number = 0): TimerHandle {\n return intervalProvider.setInterval(scheduler.flush.bind(scheduler, this), delay);\n }\n\n protected recycleAsyncId(_scheduler: AsyncScheduler, id?: TimerHandle, delay: number | null = 0): TimerHandle | undefined {\n // If this action is rescheduled with the same delay time, don't clear the interval id.\n if (delay != null && this.delay === delay && this.pending === false) {\n return id;\n }\n // Otherwise, if the action's delay time is different from the current delay,\n // or the action has been rescheduled before it's executed, clear the interval id\n if (id != null) {\n intervalProvider.clearInterval(id);\n }\n\n return undefined;\n }\n\n /**\n * Immediately executes this action and the `work` it contains.\n * @return {any}\n */\n public execute(state: T, delay: number): any {\n if (this.closed) {\n return new Error('executing a cancelled action');\n }\n\n this.pending = false;\n const error = this._execute(state, delay);\n if (error) {\n return error;\n } else if (this.pending === false && this.id != null) {\n // Dequeue if the action didn't reschedule itself. Don't call\n // unsubscribe(), because the action could reschedule later.\n // For example:\n // ```\n // scheduler.schedule(function doWork(counter) {\n // /* ... I'm a busy worker bee ... */\n // var originalAction = this;\n // /* wait 100ms before rescheduling the action */\n // setTimeout(function () {\n // originalAction.schedule(counter + 1);\n // }, 100);\n // }, 1000);\n // ```\n this.id = this.recycleAsyncId(this.scheduler, this.id, null);\n }\n }\n\n protected _execute(state: T, _delay: number): any {\n let errored: boolean = false;\n let errorValue: any;\n try {\n this.work(state);\n } catch (e) {\n errored = true;\n // HACK: Since code elsewhere is relying on the \"truthiness\" of the\n // return here, we can't have it return \"\" or 0 or false.\n // TODO: Clean this up when we refactor schedulers mid-version-8 or so.\n errorValue = e ? e : new Error('Scheduled action threw falsy error');\n }\n if (errored) {\n this.unsubscribe();\n return errorValue;\n }\n }\n\n unsubscribe() {\n if (!this.closed) {\n const { id, scheduler } = this;\n const { actions } = scheduler;\n\n this.work = this.state = this.scheduler = null!;\n this.pending = false;\n\n arrRemove(actions, this);\n if (id != null) {\n this.id = this.recycleAsyncId(scheduler, id, null);\n }\n\n this.delay = null!;\n super.unsubscribe();\n }\n }\n}\n", "import { Action } from './scheduler/Action';\nimport { Subscription } from './Subscription';\nimport { SchedulerLike, SchedulerAction } from './types';\nimport { dateTimestampProvider } from './scheduler/dateTimestampProvider';\n\n/**\n * An execution context and a data structure to order tasks and schedule their\n * execution. Provides a notion of (potentially virtual) time, through the\n * `now()` getter method.\n *\n * Each unit of work in a Scheduler is called an `Action`.\n *\n * ```ts\n * class Scheduler {\n * now(): number;\n * schedule(work, delay?, state?): Subscription;\n * }\n * ```\n *\n * @class Scheduler\n * @deprecated Scheduler is an internal implementation detail of RxJS, and\n * should not be used directly. Rather, create your own class and implement\n * {@link SchedulerLike}. Will be made internal in v8.\n */\nexport class Scheduler implements SchedulerLike {\n public static now: () => number = dateTimestampProvider.now;\n\n constructor(private schedulerActionCtor: typeof Action, now: () => number = Scheduler.now) {\n this.now = now;\n }\n\n /**\n * A getter method that returns a number representing the current time\n * (at the time this function was called) according to the scheduler's own\n * internal clock.\n * @return {number} A number that represents the current time. May or may not\n * have a relation to wall-clock time. May or may not refer to a time unit\n * (e.g. milliseconds).\n */\n public now: () => number;\n\n /**\n * Schedules a function, `work`, for execution. May happen at some point in\n * the future, according to the `delay` parameter, if specified. May be passed\n * some context object, `state`, which will be passed to the `work` function.\n *\n * The given arguments will be processed an stored as an Action object in a\n * queue of actions.\n *\n * @param {function(state: ?T): ?Subscription} work A function representing a\n * task, or some unit of work to be executed by the Scheduler.\n * @param {number} [delay] Time to wait before executing the work, where the\n * time unit is implicit and defined by the Scheduler itself.\n * @param {T} [state] Some contextual data that the `work` function uses when\n * called by the Scheduler.\n * @return {Subscription} A subscription in order to be able to unsubscribe\n * the scheduled work.\n */\n public schedule(work: (this: SchedulerAction, state?: T) => void, delay: number = 0, state?: T): Subscription {\n return new this.schedulerActionCtor(this, work).schedule(state, delay);\n }\n}\n", "import { Scheduler } from '../Scheduler';\nimport { Action } from './Action';\nimport { AsyncAction } from './AsyncAction';\nimport { TimerHandle } from './timerHandle';\n\nexport class AsyncScheduler extends Scheduler {\n public actions: Array> = [];\n /**\n * A flag to indicate whether the Scheduler is currently executing a batch of\n * queued actions.\n * @type {boolean}\n * @internal\n */\n public _active: boolean = false;\n /**\n * An internal ID used to track the latest asynchronous task such as those\n * coming from `setTimeout`, `setInterval`, `requestAnimationFrame`, and\n * others.\n * @type {any}\n * @internal\n */\n public _scheduled: TimerHandle | undefined;\n\n constructor(SchedulerAction: typeof Action, now: () => number = Scheduler.now) {\n super(SchedulerAction, now);\n }\n\n public flush(action: AsyncAction): void {\n const { actions } = this;\n\n if (this._active) {\n actions.push(action);\n return;\n }\n\n let error: any;\n this._active = true;\n\n do {\n if ((error = action.execute(action.state, action.delay))) {\n break;\n }\n } while ((action = actions.shift()!)); // exhaust the scheduler queue\n\n this._active = false;\n\n if (error) {\n while ((action = actions.shift()!)) {\n action.unsubscribe();\n }\n throw error;\n }\n }\n}\n", "import { AsyncAction } from './AsyncAction';\nimport { AsyncScheduler } from './AsyncScheduler';\n\n/**\n *\n * Async Scheduler\n *\n * Schedule task as if you used setTimeout(task, duration)\n *\n * `async` scheduler schedules tasks asynchronously, by putting them on the JavaScript\n * event loop queue. It is best used to delay tasks in time or to schedule tasks repeating\n * in intervals.\n *\n * If you just want to \"defer\" task, that is to perform it right after currently\n * executing synchronous code ends (commonly achieved by `setTimeout(deferredTask, 0)`),\n * better choice will be the {@link asapScheduler} scheduler.\n *\n * ## Examples\n * Use async scheduler to delay task\n * ```ts\n * import { asyncScheduler } from 'rxjs';\n *\n * const task = () => console.log('it works!');\n *\n * asyncScheduler.schedule(task, 2000);\n *\n * // After 2 seconds logs:\n * // \"it works!\"\n * ```\n *\n * Use async scheduler to repeat task in intervals\n * ```ts\n * import { asyncScheduler } from 'rxjs';\n *\n * function task(state) {\n * console.log(state);\n * this.schedule(state + 1, 1000); // `this` references currently executing Action,\n * // which we reschedule with new state and delay\n * }\n *\n * asyncScheduler.schedule(task, 3000, 0);\n *\n * // Logs:\n * // 0 after 3s\n * // 1 after 4s\n * // 2 after 5s\n * // 3 after 6s\n * ```\n */\n\nexport const asyncScheduler = new AsyncScheduler(AsyncAction);\n\n/**\n * @deprecated Renamed to {@link asyncScheduler}. Will be removed in v8.\n */\nexport const async = asyncScheduler;\n", "import { AsyncAction } from './AsyncAction';\nimport { AnimationFrameScheduler } from './AnimationFrameScheduler';\nimport { SchedulerAction } from '../types';\nimport { animationFrameProvider } from './animationFrameProvider';\nimport { TimerHandle } from './timerHandle';\n\nexport class AnimationFrameAction extends AsyncAction {\n constructor(protected scheduler: AnimationFrameScheduler, protected work: (this: SchedulerAction, state?: T) => void) {\n super(scheduler, work);\n }\n\n protected requestAsyncId(scheduler: AnimationFrameScheduler, id?: TimerHandle, delay: number = 0): TimerHandle {\n // If delay is greater than 0, request as an async action.\n if (delay !== null && delay > 0) {\n return super.requestAsyncId(scheduler, id, delay);\n }\n // Push the action to the end of the scheduler queue.\n scheduler.actions.push(this);\n // If an animation frame has already been requested, don't request another\n // one. If an animation frame hasn't been requested yet, request one. Return\n // the current animation frame request id.\n return scheduler._scheduled || (scheduler._scheduled = animationFrameProvider.requestAnimationFrame(() => scheduler.flush(undefined)));\n }\n\n protected recycleAsyncId(scheduler: AnimationFrameScheduler, id?: TimerHandle, delay: number = 0): TimerHandle | undefined {\n // If delay exists and is greater than 0, or if the delay is null (the\n // action wasn't rescheduled) but was originally scheduled as an async\n // action, then recycle as an async action.\n if (delay != null ? delay > 0 : this.delay > 0) {\n return super.recycleAsyncId(scheduler, id, delay);\n }\n // If the scheduler queue has no remaining actions with the same async id,\n // cancel the requested animation frame and set the scheduled flag to\n // undefined so the next AnimationFrameAction will request its own.\n const { actions } = scheduler;\n if (id != null && actions[actions.length - 1]?.id !== id) {\n animationFrameProvider.cancelAnimationFrame(id as number);\n scheduler._scheduled = undefined;\n }\n // Return undefined so the action knows to request a new async id if it's rescheduled.\n return undefined;\n }\n}\n", "import { AsyncAction } from './AsyncAction';\nimport { AsyncScheduler } from './AsyncScheduler';\n\nexport class AnimationFrameScheduler extends AsyncScheduler {\n public flush(action?: AsyncAction): void {\n this._active = true;\n // The async id that effects a call to flush is stored in _scheduled.\n // Before executing an action, it's necessary to check the action's async\n // id to determine whether it's supposed to be executed in the current\n // flush.\n // Previous implementations of this method used a count to determine this,\n // but that was unsound, as actions that are unsubscribed - i.e. cancelled -\n // are removed from the actions array and that can shift actions that are\n // scheduled to be executed in a subsequent flush into positions at which\n // they are executed within the current flush.\n const flushId = this._scheduled;\n this._scheduled = undefined;\n\n const { actions } = this;\n let error: any;\n action = action || actions.shift()!;\n\n do {\n if ((error = action.execute(action.state, action.delay))) {\n break;\n }\n } while ((action = actions[0]) && action.id === flushId && actions.shift());\n\n this._active = false;\n\n if (error) {\n while ((action = actions[0]) && action.id === flushId && actions.shift()) {\n action.unsubscribe();\n }\n throw error;\n }\n }\n}\n", "import { AnimationFrameAction } from './AnimationFrameAction';\nimport { AnimationFrameScheduler } from './AnimationFrameScheduler';\n\n/**\n *\n * Animation Frame Scheduler\n *\n * Perform task when `window.requestAnimationFrame` would fire\n *\n * When `animationFrame` scheduler is used with delay, it will fall back to {@link asyncScheduler} scheduler\n * behaviour.\n *\n * Without delay, `animationFrame` scheduler can be used to create smooth browser animations.\n * It makes sure scheduled task will happen just before next browser content repaint,\n * thus performing animations as efficiently as possible.\n *\n * ## Example\n * Schedule div height animation\n * ```ts\n * // html:
\n * import { animationFrameScheduler } from 'rxjs';\n *\n * const div = document.querySelector('div');\n *\n * animationFrameScheduler.schedule(function(height) {\n * div.style.height = height + \"px\";\n *\n * this.schedule(height + 1); // `this` references currently executing Action,\n * // which we reschedule with new state\n * }, 0, 0);\n *\n * // You will see a div element growing in height\n * ```\n */\n\nexport const animationFrameScheduler = new AnimationFrameScheduler(AnimationFrameAction);\n\n/**\n * @deprecated Renamed to {@link animationFrameScheduler}. Will be removed in v8.\n */\nexport const animationFrame = animationFrameScheduler;\n", "import { Observable } from '../Observable';\nimport { SchedulerLike } from '../types';\n\n/**\n * A simple Observable that emits no items to the Observer and immediately\n * emits a complete notification.\n *\n * Just emits 'complete', and nothing else.\n *\n * ![](empty.png)\n *\n * A simple Observable that only emits the complete notification. It can be used\n * for composing with other Observables, such as in a {@link mergeMap}.\n *\n * ## Examples\n *\n * Log complete notification\n *\n * ```ts\n * import { EMPTY } from 'rxjs';\n *\n * EMPTY.subscribe({\n * next: () => console.log('Next'),\n * complete: () => console.log('Complete!')\n * });\n *\n * // Outputs\n * // Complete!\n * ```\n *\n * Emit the number 7, then complete\n *\n * ```ts\n * import { EMPTY, startWith } from 'rxjs';\n *\n * const result = EMPTY.pipe(startWith(7));\n * result.subscribe(x => console.log(x));\n *\n * // Outputs\n * // 7\n * ```\n *\n * Map and flatten only odd numbers to the sequence `'a'`, `'b'`, `'c'`\n *\n * ```ts\n * import { interval, mergeMap, of, EMPTY } from 'rxjs';\n *\n * const interval$ = interval(1000);\n * const result = interval$.pipe(\n * mergeMap(x => x % 2 === 1 ? of('a', 'b', 'c') : EMPTY),\n * );\n * result.subscribe(x => console.log(x));\n *\n * // Results in the following to the console:\n * // x is equal to the count on the interval, e.g. (0, 1, 2, 3, ...)\n * // x will occur every 1000ms\n * // if x % 2 is equal to 1, print a, b, c (each on its own)\n * // if x % 2 is not equal to 1, nothing will be output\n * ```\n *\n * @see {@link Observable}\n * @see {@link NEVER}\n * @see {@link of}\n * @see {@link throwError}\n */\nexport const EMPTY = new Observable((subscriber) => subscriber.complete());\n\n/**\n * @param scheduler A {@link SchedulerLike} to use for scheduling\n * the emission of the complete notification.\n * @deprecated Replaced with the {@link EMPTY} constant or {@link scheduled} (e.g. `scheduled([], scheduler)`). Will be removed in v8.\n */\nexport function empty(scheduler?: SchedulerLike) {\n return scheduler ? emptyScheduled(scheduler) : EMPTY;\n}\n\nfunction emptyScheduled(scheduler: SchedulerLike) {\n return new Observable((subscriber) => scheduler.schedule(() => subscriber.complete()));\n}\n", "import { SchedulerLike } from '../types';\nimport { isFunction } from './isFunction';\n\nexport function isScheduler(value: any): value is SchedulerLike {\n return value && isFunction(value.schedule);\n}\n", "import { SchedulerLike } from '../types';\nimport { isFunction } from './isFunction';\nimport { isScheduler } from './isScheduler';\n\nfunction last(arr: T[]): T | undefined {\n return arr[arr.length - 1];\n}\n\nexport function popResultSelector(args: any[]): ((...args: unknown[]) => unknown) | undefined {\n return isFunction(last(args)) ? args.pop() : undefined;\n}\n\nexport function popScheduler(args: any[]): SchedulerLike | undefined {\n return isScheduler(last(args)) ? args.pop() : undefined;\n}\n\nexport function popNumber(args: any[], defaultValue: number): number {\n return typeof last(args) === 'number' ? args.pop()! : defaultValue;\n}\n", "export const isArrayLike = ((x: any): x is ArrayLike => x && typeof x.length === 'number' && typeof x !== 'function');", "import { isFunction } from \"./isFunction\";\n\n/**\n * Tests to see if the object is \"thennable\".\n * @param value the object to test\n */\nexport function isPromise(value: any): value is PromiseLike {\n return isFunction(value?.then);\n}\n", "import { InteropObservable } from '../types';\nimport { observable as Symbol_observable } from '../symbol/observable';\nimport { isFunction } from './isFunction';\n\n/** Identifies an input as being Observable (but not necessary an Rx Observable) */\nexport function isInteropObservable(input: any): input is InteropObservable {\n return isFunction(input[Symbol_observable]);\n}\n", "import { isFunction } from './isFunction';\n\nexport function isAsyncIterable(obj: any): obj is AsyncIterable {\n return Symbol.asyncIterator && isFunction(obj?.[Symbol.asyncIterator]);\n}\n", "/**\n * Creates the TypeError to throw if an invalid object is passed to `from` or `scheduled`.\n * @param input The object that was passed.\n */\nexport function createInvalidObservableTypeError(input: any) {\n // TODO: We should create error codes that can be looked up, so this can be less verbose.\n return new TypeError(\n `You provided ${\n input !== null && typeof input === 'object' ? 'an invalid object' : `'${input}'`\n } where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.`\n );\n}\n", "export function getSymbolIterator(): symbol {\n if (typeof Symbol !== 'function' || !Symbol.iterator) {\n return '@@iterator' as any;\n }\n\n return Symbol.iterator;\n}\n\nexport const iterator = getSymbolIterator();\n", "import { iterator as Symbol_iterator } from '../symbol/iterator';\nimport { isFunction } from './isFunction';\n\n/** Identifies an input as being an Iterable */\nexport function isIterable(input: any): input is Iterable {\n return isFunction(input?.[Symbol_iterator]);\n}\n", "import { ReadableStreamLike } from '../types';\nimport { isFunction } from './isFunction';\n\nexport async function* readableStreamLikeToAsyncGenerator(readableStream: ReadableStreamLike): AsyncGenerator {\n const reader = readableStream.getReader();\n try {\n while (true) {\n const { value, done } = await reader.read();\n if (done) {\n return;\n }\n yield value!;\n }\n } finally {\n reader.releaseLock();\n }\n}\n\nexport function isReadableStreamLike(obj: any): obj is ReadableStreamLike {\n // We don't want to use instanceof checks because they would return\n // false for instances from another Realm, like an + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/data-management/filesystem-and-storage/data-storage/index.html b/data-management/filesystem-and-storage/data-storage/index.html new file mode 100644 index 0000000000..f93c6d4c57 --- /dev/null +++ b/data-management/filesystem-and-storage/data-storage/index.html @@ -0,0 +1,7046 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Data Storage - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

ALCF Data Storage

+

Disk Storage

+

The ALCF operates a number of file systems that are mounted globally across all of our production systems.

+

Home

+

A Lustre file system residing on a DDN AI-400X NVMe Flash platform. It has 24 NVMe drives with 7 TB each with 123 TB of usable space. It provides 8 Object Storage Targets and 4 Metadata Targets.

+

Grand

+

A Lustre file system residing on an HPE ClusterStor E1000 platform equipped with 100 Petabytes of usable capacity across 8480 disk drives. This ClusterStor platform provides 160 Object Storage Targets and 40 Metadata Targets with an aggregate data transfer rate of 650GB/s. The primary use of grand is compute campaign storage.

+

Also see ALCF Data Policies and Data Transfer

+

Eagle

+

A Lustre file system residing on an HPE ClusterStor E1000 platform equipped with 100 Petabytes of usable capacity across 8480 disk drives. This ClusterStor platform provides 160 Object Storage Targets and 40 Metadata Targets with an aggregate data transfer rate of 650GB/s. The primary use of eagle is data sharing with the research community. Eagle has community sharing community capabilities which allow PIs to share their project data with external collabortors using Globus. Eagle can also be used for compute campaign storage.

+

Also see ALCF Data Policies and Data Transfer

+

theta-fs0

+

A Lustre file system residing on an HPE Sonexion 3000 storage array with a usable capacity of 9.2PB and an aggregate data transfer rate of 240GB/s. This is a legacy file system. No new allocations are granted on theta-fs0.

+

Also see ALCF Data Policies and Data Transfer

+

theta-fs1

+

A GPFS file system that resides on an IBM Elastic Storage System (ESS) cluster with a usable capacity of 7.9PB and an aggregate data transfer rate of 400GB/s. This is a legacy file system. No new allocations are granted on theta-fs1.

+

Also see ALCF Data Policies and Data Transfer

+

Tape Storage

+

ALCF operates three 10,000 slot Spectralogic tape libraries. We are currently running a combination of LTO6 and LTO8 tape technology. The LTO tape drives have built-in hardware compression which typically achieve compression ratios between 1.25:1 and 2:1 depending on the data yielding an effective capacity of approximately 65PB.

+

HPSS

+

HPSS is a data archive and retrieval system that manages large amounts of data on disk and robotic tape libraries. It provides hierarchical storage management services that allow it to migrate data between those storage platforms.

+

HPSS is currently configured with a disk and tape tier. The disk tier has a capacity of 1.2PB on a DataDirect Networks SFA12K-40 storage array. By default, all archived data is initially written to the disk tier. The tape tier consists of 3 SpectraLogic T950 robotic tape libraries containing a total of 72 LTO6 tape drives with total uncompressed capacity 64 PB. Archived data is migrated to the tape tier at regular intervals, then deleted from the disk tier to create space for future archives.

+

Access to HPSS is provided by various client components. Currently, ALCF supports access through two command-line clients, HSI and HTAR. These are installed on the login nodes of Theta and Cooley. In order for the client to authenticate with HPSS, the user must have a keytab file that should be located in their home directory under subdirectory .hpss. The file name will be in the format .ktb_.

+

HSI General Usage

+

Before you can use HSI on XC40 systems such as Theta, you must load a module:

+

module load hsi

+

HSI can be invoked by simply entering hsi at your normal shell prompt. Once authenticated, you will enter the hsi command shell environment:

+
> hsi
+[HSI]/home/username->
+
+

You may enter "help" to display a brief description of available commands.

+

If archiving from or retrieving to grand or eagle you must disable the Transfer Agent. -T off

+

Example archive +

[HSI]/home/username-> put mydatafile                # same name on HPSS
+[HSI]/home/username-> put local.file : hpss.file    # different name on HPSS
+[HSI]/home/username-> put -T off mydatafile
+

+

Example retrieval +

[HSI]/home/username-> get mydatafile
+[HSI]/home/username-> get local.file : hpss.file
+[HSI]/home/username-> get -T off mydatafile
+

+

Most of the usual shell commands will work as expected in the HSI command environment. For example, checking what files are archived:

+

[HSI]/home/username-> ls -l

+

And organizing your archived files:

+
[HSI]/home/username-> mkdir dataset1
+[HSI]/home/username-> mv hpss.file dataset1
+[HSI]/home/username-> ls dataset1
+[HSI]/home/username-> rm dataset1/hpss.file
+
+

It may be necessary to use single or double quotes around metacharacters to avoid having the shell prematurely expand them. For example:

+
[HSI]/home/username-> get *.c
+
+

will not work, but

+
[HSI]/home/username-> get "*.c"
+
+

will retrieve all files ending in .c.

+

Following normal shell conventions, other special characters in filenames such as whitespace and semicolon also need to be escaped with "\" (backslash). For example:

+
       [HSI]/home/username-> get "data\ file\ \;\ version\ 1"
+
+

retrieves the file named "data file ; version 1".

+

HSI can also be run as a command line or embedded in a script as follows:

+
hsi -O log.file "put local.file"
+
+

HTAR General Usage

+

HTAR is a tar-like utility that creates tar-format archive files directly in HPSS. It can be run as a command line or embedded in a script.

+

Example archive +

htar -cf hpssfile.tar localfile1 localfile2 localfile3
+

+

Example retrieval

+
htar -xf hpssfile.tar localfile2
+
+

NOTE: On Theta you must first load the HSI module to make HSI and HTAR available. "module load hsi"
+NOTE: The current version of HTAR has a 64GB file size limit as well as a path length limit. The recommended client is HSI.

+

Globus

+

In addition, HPSS is accessible through the Globus endpoint alcf#dtn_hpss. As with HSI and HTAR, you must have a keytab file before using this endpoint. For more information on using Globus, please see [Using Globus].

+

Keytab File Missing

+

If you see an error like this:

+
*** HSI: (KEYTAB auth method) - keytab file missing or inaccessible: /
+ home/username/.hpss/.ktb_username
+ Error - authentication/initialization failed
+
+

it means that your account is not enabled to use the HPSS yet. Please contact support to have it set up.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/data-management/filesystem-and-storage/disk-quota/index.html b/data-management/filesystem-and-storage/disk-quota/index.html new file mode 100644 index 0000000000..c8ab57faa7 --- /dev/null +++ b/data-management/filesystem-and-storage/disk-quota/index.html @@ -0,0 +1,6843 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Disk Quota - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Disk Quota

+

Overview

+

Disk quotas are enabled on project directories. ALCF's HPC systems use the swift-home file system located at /lus/swift/home where quotas are also enforced. Theta has three project file systems available to user. Details on the home file system are listed in file systems. Following are descriptions and examples for the home file system, as well as the theta-fs0, grand and eagle project filesystems.

+

Home Directory Quotas

+

By default, each home directory is assigned a default of 50GB. File ownership determines disk space usage.

+

To check the home directory usage, enter this command: +

> myquota
+Name                           Type     Filesystem        Used               Quota          Grace
+=========================================================================================================
+userX                         User     /lus/swift         44.13G          50.00G             none
+

+

Project Directory Quotas

+

The Grand, Eagle, and Lustre project file system (/lus/theta-fs0) support project quotas. The amount of data stored under /lus//projects/PROJECT_NAME cannot exceed the approved project quota limit approved during the allocation period. The total data usage under the project directory is used to calculate the disk quota.

+

To check project quota usage on the file systems, enter this command: +

> myprojectquotas
+
+Lustre : Current Project Quota information for projects you're a member of:
+
+Name                       Type        Filesystem          Used             Quota           Grace
+==============================================================================================================
+projectX                  Project      theta-fs0            354.4T             700T            -
+projectY                  Project      theta-fs0            916k                 1T            -
+projectZ                  Project      grand                  8k              1000T            -
+projectX                  Project      eagle                1.87T             1000T            -
+

+

Requesting a New Eagle Allocation

+

For requesting a new project having an allocation on Eagle (with or without a compute allocation), please make a request by filling out the Director's Discretionary allocation form. Note that all new compute projects will have Grand as the default file system.

+

Quota Increases

+

If you need a quota increase for Director's Discretionary allocations, please make a request by filling out the Director's Discretionary allocation form.

+

If you need a quota increase for your INCITE/ALCC/ALCC/ESP project directory, please send an email to support@alcf.anl.gov with the machine, project name, new quota amount and reason for the increase.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/data-management/filesystem-and-storage/file-systems/index.html b/data-management/filesystem-and-storage/file-systems/index.html new file mode 100644 index 0000000000..2845d63b00 --- /dev/null +++ b/data-management/filesystem-and-storage/file-systems/index.html @@ -0,0 +1,7103 @@ + + + + + + + + + + + + + + + + + + + + + + + + + File Systems - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

ALCF File Systems

+

Our HPC systems have three discrete file systems for project data: theta-fs0, Grand, and Eagle. +Theta-fs0 is an Intel Enterprise Edition Lustre parallel file system mounted as /lus-projects or /projects. +Grand and Eagle are 100 PB Lustre file systems mounted as /grand and /eagle respectively. +For more information on the Lustre file system, here is a document on Lustre File Striping Basics.

+ +

For information on the AI Testbed storage systems, refer to the AI Testbed storage page: https://argonne-lcf.github.io/ai-testbed-userdocs/common/storage/

+

Our HPC systems also share a Lustre home file system, called swift-home. The home file system is mounted as /home, and should generally be used for small files and any binaries to be run on Theta. The performance of this file system is reasonable, but using it for intensive I/O from the compute nodes is discouraged because I/O from the compute nodes uses the project data file systems, which are fast parallel systems and have far more storage space and greater I/O performance than the home directory space.

+

The swift-home file system is regularly backed up to tape. The data file system is not backed up. It is the user’s responsibility to ensure that copies of any critical data on the data file system have either been archived to tape or stored elsewhere.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameAccessible FromTypePathProductionBacked-upUsage
swift-homeTheta
ThetaGPU
Cooley
Polaris
Lustre/home or /lus/swift/homeYesYesGeneral use
lus-projects (theta-fs0)Theta
ThetaGPU
Cooley
Lustre/projects or /lus-projects or /lus/theta-fs0/projectsYesNoIntensive job output, large files
GrandTheta
ThetaGPU
Cooley
Polaris
Lustre/grand or /lus/grand/projectsYesNoIntensive job output, large files
EagleTheta
ThetaGPU
Cooley
Polaris
Lustre/eagle or /lus/eagle/projectsYesNoCommunity sharing via Globus;
Intensive job output, large files
Node SSD

(Compute node only)
Theta
ThetaGPU
Polaris
xfs/local/scratch (Theta)
/local/scratch (Polaris)
/raid/scratch (ThetaGPU)
Yes

Theta & ThetaGPU by request only
NoLocal node scratch during run
+

Available Directories

+

Home Directories

+
    +
  • Created when an account is created
  • +
  • Located under /home
  • +
  • Each home directory is subject to a quota based on user file ownership. The default quota is 50 GB
  • +
+

Sharing Home Directory Files or Subdirectories with Others

+

If you need to share files or subdirectories (folders) under your home directory with collaborators (other ALCF users), you need to change file permissions from their defaults. You must change permissions of your top-level /home/username directory, even if you only want to share certain files/directories within it. Using normal linux file permissions control is good enough to give access to all other users, and is simple. For more fine-grained control over specific users, you need to use linux access control list (ACL) commands.

+
Simple Method: Permission to All Users
+

First, a one-time-only change to your top-level /home/username directory.

+
chmod o+x /home/username
+
+

Then you may permission individual files and/or subdirectories with read access. For example, to recursively change permissions on /home/username/subdirectoryname so that all files in that subdirectory and any subdirectory trees within it are world-readable, you would use

+
chmod -R o+Xr /home/username/subdirectoryname
+
+
Refined Method: Use ACL to Give Permission to Specific Users
+

First, a one-time-only change to your top-level /home/username directory. To share files/directories with user gilgamesh, for example:

+
setfacl u:gilgamesh:--x /home/username
+
+

Then you may permission individual files and/or subdirectories with read access. For example, to recursively change permissions on /home/username/subdirectoryname so that all files in that subdirectory and any subdirectory trees within it are readable to user gilgamesh, you would use

+
setfacl -R -u gilgamesh:m:r-X,d:u:gilgamesh:r-X /home/username/subdirectoryname
+
+

Project Directories

+
    +
  • Directories on Grand or Eagle are created when an allocation (INCITE, ALCC, Discretionary, etc.) is awarded. Eagle directories can be created as stand-alone allocations. Use the allocation request form to submit requests for an allocation on Eagle. Note that project directories are no longer created on theta-fs0.
  • +
  • Directory paths:
      +
    • theta-fs0: /projects or /lus-projects or /lus/theta-fs0/projects
    • +
    • Grand: /grand or /lus/grand/projects
    • +
    • Eagle: /eagle or /lus/eagle/projects
    • +
    +
  • +
+

These project spaces do not have user quotas but a directory quota, meaning that ALL files contained within a project directory, regardless of the username, cannot exceed the disk space allocation granted to the project. For more information on quotas, see the Disk Quota page.

+

Local Node SSD

+

Access to SSDs is disabled by default for Theta and ThetaGPU. Project PIs may request access by emailing support@alcf.anl.gov. A use case will need to be provided.

+

Access to SSDs is enabled by default on Polaris.

+

SSD Information

+
    +
  • Local scratch SSD storage on compute nodes for running jobs
  • +
  • Completely local non-parallel filesystem
  • +
  • Located at /local/scratch on Theta and Polaris computes and /raid/scratch on ThetaGPU computes
  • +
  • Wiped between Cobalt/PBS Pro jobs
  • +
  • No automatic backups provided
  • +
  • Information on the current SSD drives in use is below:
  • +
+

Polaris SSD Specs

+

Model PM1725a drives specifications

+ + + + + + + + + + + + + + + + + + + + + +
Model PM1725a drives-------
Capacity1.6 TB
SequentialRead 3300 MB/s
SequentialWrite 3300 MB/s
+

Theta and ThetaGPU SSD Specs

+

Model SM961 drives

+ + + + + + + + + + + + + + + + + + + + + +
Model SM961 drives-------
Capacity128 GB
SequentialRead 3100 MB/s
SequentialWrite 700 MB/s
+

Model SM951 drives specifications

+ + + + + + + + + + + + + + + + + + + + + +
Model SM951 drives------
Capacity128 GB
SequentialRead 2150 MB/s
SequentialWrite 1550 MB/s
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/data-management/filesystem-and-storage/hpss/index.html b/data-management/filesystem-and-storage/hpss/index.html new file mode 100644 index 0000000000..ac23b80cc6 --- /dev/null +++ b/data-management/filesystem-and-storage/hpss/index.html @@ -0,0 +1,6922 @@ + + + + + + + + + + + + + + + + + + + + + + + + + HPSS - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Using HPSS

+

Overview

+

HPSS is a data archive and retrieval system that manages large amounts of data on disk and robotic tape libraries. It provides hierarchical storage management services that allow it to migrate data between those storage platforms.

+

HPSS is currently configured with a disk and tape tier. The disk tier has a capacity of 1.2PB on a DataDirect Networks SFA12K-40 storage array. By default, all archived data is initially written to the disk tier. The tape tier consists of 3 SpectraLogic T950 robotic tape libraries containing a total of 72 LTO6 tape drives with total uncompressed capacity 64 PB. Archived data is migrated to the tape tier at regular intervals, then deleted from the disk tier to create space for future archives.

+

Access to HPSS is provided by various client components. Currently, ALCF supports access through two command-line clients: HSI and HTAR. These are installed on the login nodes of Theta, Cooley, and Polaris. In order for the client to authenticate with HPSS, the user must have a keytab file that should be located in their home directory under subdirectory .hpss. The file name will be in the format .ktb_<userid>.

+

HSI General Usage

+

HSI can be invoked by simply entering hsi at your normal shell prompt. Once authenticated, you will enter the hsi command shell environment: +

> hsi
+[HSI]/home/username->
+

+

You may enter "help" to display a brief description of available commands.

+

Example archive: +

[HSI]/home/username-> put mydatafile                # same name on HPSS
+[HSI]/home/username-> put local.file : hpss.file    # different name on HPSS
+

+

Example retrieval: +

[HSI]/home/username-> get mydatafile
+[HSI]/home/username-> get local.file : hpss.file
+

+

Most of the usual shell commands will work as expected in the HSI command environment.

+

For example, checking what files are archived: +

[HSI]/home/username-> ls -l
+

+

And organizing your archived files: +

[HSI]/home/username-> mkdir dataset1
+[HSI]/home/username-> mv hpss.file dataset1
+[HSI]/home/username-> ls dataset1
+[HSI]/home/username-> rm dataset1/hpss.file
+

+

It may be necessary to use single or double quotes around metacharacters to avoid having the shell prematurely expand them.

+

For example: +

[HSI]/home/username-> get *.c
+
+will not work, but
+
+[HSI]/home/username-> get "*.c"
+
+will retrieve all files ending in .c.  
+

+

Following normal shell conventions, other special characters in filenames such as whitespace and semicolon also need to be escaped with "\" (backslash). For example:

+
   [HSI]/home/username-> get "data\ file\ \;\ version\ 1"
+
+ +

retrieves the file named "data file ; version 1".

+

HSI can also be run as a command line or embedded in a script as follows: +

hsi -O log.file "put local.file"
+

+

HTAR General Usage

+

HTAR is a tar-like utility that creates tar-format archive files directly in HPSS. It can be run as a command line or embedded in a script.

+

Example archive: +

htar -cf hpssfile.tar localfile1 localfile2 localfile3
+

+

Example retrieval: +

htar -xf hpssfile.tar localfile2
+

+

Note: +- On Theta you must first load the HSI module to make HSI and HTAR available. "module load hsi" +- The current version of HTAR has a 64GB file size limit as well as a path length limit. The recommended client is HSI

+

Globus

+

In addition, HPSS is accessible through the Globus endpoint alcf#dtn_hpss. As with HSI and HTAR, you must have a keytab file before using this endpoint. For more information on using Globus, please see Using Globus.

+

Common Problems

+

Keytab File Missing

+

If you see an error like this: +*** HSI: (KEYTAB auth method) - keytab file missing or inaccessible: / + home/username/.hpss/.ktb_username + Error - authentication/initialization failed + it means that your account is not enabled to use the HPSS yet. Please contact support to have it set up.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/images/ANL_ko_300.png b/images/ANL_ko_300.png new file mode 100644 index 0000000000..ee17c579ea Binary files /dev/null and b/images/ANL_ko_300.png differ diff --git a/images/Argonne_wireframe_white_transparent.eps b/images/Argonne_wireframe_white_transparent.eps new file mode 100644 index 0000000000..878c3f403c Binary files /dev/null and b/images/Argonne_wireframe_white_transparent.eps differ diff --git a/images/Argonne_wireframe_white_transparent.png b/images/Argonne_wireframe_white_transparent.png new file mode 100644 index 0000000000..b2284c3b75 Binary files /dev/null and b/images/Argonne_wireframe_white_transparent.png differ diff --git a/images/DOE_ko_300.png b/images/DOE_ko_300.png new file mode 100644 index 0000000000..c362ee143e Binary files /dev/null and b/images/DOE_ko_300.png differ diff --git a/images/alcf_favicon.ico b/images/alcf_favicon.ico new file mode 100644 index 0000000000..868526c44a Binary files /dev/null and b/images/alcf_favicon.ico differ diff --git a/images/aurora.png b/images/aurora.png new file mode 100644 index 0000000000..f0e0c918aa Binary files /dev/null and b/images/aurora.png differ diff --git a/images/engineering.png b/images/engineering.png new file mode 100644 index 0000000000..ca62980809 Binary files /dev/null and b/images/engineering.png differ diff --git a/images/getstarted.png b/images/getstarted.png new file mode 100644 index 0000000000..82bb7431e2 Binary files /dev/null and b/images/getstarted.png differ diff --git a/index.html b/index.html new file mode 100644 index 0000000000..13e59478e2 --- /dev/null +++ b/index.html @@ -0,0 +1,6777 @@ + + + + + + + + + + + + + + + + + + + + + + + ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

ALCF User Guides

+

We are moving our ALCF documentation into GitHub to make it easier to contribute and collaborate to our user and machine guides.

+

Our user guides contain information for:

+
    +
  • Account and Project Management: Information and instructions on how to manage your ALCF account and awarded project.
  • +
  • Data Management: Information on our file systems that are mounted globally across all of our production systems.
  • +
  • Polaris: Information on how to get started our newest supercomputer.
  • +
  • Theta: Information on how to use our Cray XC40/KNL supercomputer.
  • +
  • ThetaGPU: Information on how to use our NVIDIA DGX A100 supercomputer.
  • +
  • Cooley: Information on how to use our visualization cluster.
  • +
  • AI Testbed: Information on how to use our AI Accelerators.
  • +
  • Aurora/Sunspot: Information on getting your code ready for our upcoming exacale supercomputer.
  • +
  • Services: Information on how to use various services provided across clusters.
  • +
  • Facility Policies: Information on our policies and procedures.
  • +
+

How to Get Access

+

Researchers interested in using the ALCF systems (including Polaris and the AI Testbed’s Cerebras CS-2 and SambaNova DataScale platforms) can now submit project proposals via the ALCF’s Director’s Discretionary program. Calls for porposals for additional allocation programs will be open at a later date.

+

Submit your proposal requests at: Allocation Request Page

+

Getting Started

+

If you'd like to get started using our ALCF resources, our Getting Started webpage provides information on what you need to do in order to get time on our systems, get an account, and how to start running jobs.

+

If you have an account and an award for Polaris, we suggest visiting on Getting on Polaris webpage.

+

If you'd like to user ThetaGPU, visit our Getting Started on ThetaGPU webpage.

+

If you'd like to user our AI accelerators, visit our Getting Started on AI Testbed webpage.

+

Please send feedback to support@alcf.anl.gov

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/javascripts/alcf-extra.js b/javascripts/alcf-extra.js new file mode 100644 index 0000000000..66875b17a4 --- /dev/null +++ b/javascripts/alcf-extra.js @@ -0,0 +1,145 @@ +/** + * Dropdown + * + * @description + * + * @param config An object of configuration settings: + * + * + * @return new instance of Dropdown + */ + + + + + +// Config defaults and init +// ---------------------------------------------------------------------------- + +var Dropdown = function (config) { + this.hook = config.hook || 'js-drop'; + this.menu = config.menu; + this.event = config.event || 'click'; + this.pane = document.getElementById(this.menu); +} + + +Dropdown.prototype.init = function() { + this.modifyHooks(this.hook, this.addListener.bind(this, this.event)); +} + + + + + +// Shared methods +// ---------------------------------------------------------------------------- + +// grab element +Dropdown.prototype.modifyHooks = function(hook, func) { + var elem = document.getElementById(hook); + // this.addBgListener(elem); + func(elem); +} + +// attach listeners to the document and menu items +Dropdown.prototype.addListener = function(event, elem) { + document.addEventListener("mouseover", function(e) { + if (e.target.closest("#"+this.hook)) { + this.toggleMenu(elem); + } + else if (e.target.closest("#"+this.menu)) {return;} + else if (this.pane.classList.contains('js-dropdown-visible')) { + this.toggleMenu(elem); + } + }.bind(this), false); + document.addEventListener("mouseout", function(e) { + if (e.target.closest("#"+this.hook)) { + this.toggleMenu(elem); + } + else if (e.target.closest("#"+this.menu)) {return;} + else if (this.pane.classList.contains('js-dropdown-visible')) { + this.toggleMenu(elem); + } + }.bind(this), false); +} + +// toggle menu pane visibility +Dropdown.prototype.toggleMenu = function(elem) { + if (this.pane.classList.contains('js-dropdown-hidden')) { + this.pane.classList.replace('js-dropdown-hidden', 'js-dropdown-visible'); + } else if (this.pane.classList.contains('js-dropdown-visible')) { + this.pane.classList.replace('js-dropdown-visible', 'js-dropdown-hidden'); + } +} + + + + +// Include the dropdowns +// ---------------------------------------------------------------------------- + +var dropdowns = document.getElementsByClassName('js-drop'); + +if (dropdowns.length > 0) { + var menus = []; + + Array.prototype.forEach.call(dropdowns, function(el) { + menus.push(new Dropdown({'hook': el.id, 'menu': el.dataset.menu})); + }); + + Array.prototype.forEach.call(menus, function(m) { + m.init(); + }); +} + + + + +// // Include the mobile dropdowns (unlike above, just writing it all here) +// // ---------------------------------------------------------------------------- + + +// // open/close the big pane +// var mobileOpen = document.getElementById('js-mobileOpen'); +// var mobileClose = document.getElementById('js-mobileClose'); +// var mobileMenu = document.getElementById('js-mobileMenu'); + +// mobileOpen.addEventListener("click", function(e) { +// mobileMenu .classList.replace("menu--closed", "menu--open"); +// }); + +// mobileClose.addEventListener("click", function(e) { +// mobileMenu .classList.replace("menu--open", "menu--closed"); +// }); + + +// // open/close individual menus + +// var drawerHeads = document.getElementsByClassName('drawer-head'); + +// Array.prototype.forEach.call(drawerHeads, function(head){ + +// head.addEventListener("click", function(e){ +// var mobmenu = head.dataset.mobmenu; +// mobmenu = document.getElementById(mobmenu); +// var arrow = this.querySelector(".drawer-arrow"); + +// if (mobmenu.classList.contains('menu--closed')) { +// mobmenu.classList.remove('menu--closed'); +// arrow.innerHTML = "▲" +// } +// else { +// mobmenu.classList.add('menu--closed'); +// arrow.innerHTML = "▼" +// } + +// }); +// }); + +// // add listener to each of the links that toggles menus + + + + + diff --git a/polaris/applications-and-libraries/applications/gromacs/index.html b/polaris/applications-and-libraries/applications/gromacs/index.html new file mode 100644 index 0000000000..de3493f04a --- /dev/null +++ b/polaris/applications-and-libraries/applications/gromacs/index.html @@ -0,0 +1,6845 @@ + + + + + + + + + + + + + + + + + + + + + + + + + GROMACS - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Gromacs on Polaris

+

What is Gromacs?

+

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids, and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

+

Using GROMACS at ALCF

+

ALCF offers assistance with building binaries and compiling instructions for GROMACS. For questions, contact us at support@alcf.anl.gov.

+

Building Gromacs

+
    +
  1. Download latest source code: http://manual.gromacs.org/documentation/2022.1/download.html
  2. +
  3. tar -xzf gromacs-2022.1.tar.gz
  4. +
  5. module swap PrgEnv-nvhpc PrgEnv-gnu
  6. +
  7. module load cudatoolkit-standalone/11.2.2
  8. +
  9. module load gcc/10.3.0
  10. +
  11. module load cmake
  12. +
  13. cd gromacs-2022.1
  14. +
  15. mkdir build
  16. +
  17. cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC \
    +      -DBUILD_SHARED_LIBS=OFF -DGMX_BUILD_OWN_FFTW=ON \
    +      -DCMAKE_INSTALL_PREFIX=/path-to/gromacs-2022.1/build \
    +      -DGMX_MPI=ON -DGMX_OPENMP=ON -DGMX_GPU=CUDA \
    +      -DCUDA_TOOLKIT_ROOT_DIR=/soft/compilers/cudatoolkit/cuda-11.2.2
    +
  18. +
  19. make –j 8
  20. +
  21. make install
  22. +
  23. The installed binary is build/bin/gmx_mpi.
  24. +
+

Running Gromacs on Polaris

+

Prebuilt Gromacs binaries can be found in the directory /soft/applications/Gromacs/gromacs-2022.1.

+

A sample pbs script follows that will run GROMACS on two nodes, using 4 MPI ranks per node, and each rank with four OpenMP threads. The PME kernel owns one MPI rank and one GPU per node, while the nonbonded kernel uses 3 MPI ranks and 3 GPUs per node.

+
#!/bin/sh
+#PBS -l select=2:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -q debug
+#PBS -A PROJECT
+#PBS -l filesystems=home:grand:eagle
+
+cd ${PBS_O_WORKDIR}
+
+module swap PrgEnv-nvhpc PrgEnv-gnu
+module load cudatoolkit-standalone/11.2.2
+
+export OMP_NUM_THREADS=4
+
+mpirun --np 8 /soft/applications/Gromacs/gromacs-2022.1/gmx_mpi \
+      mdrun -gputasks 0123 -nb gpu -pme gpu -npme 1 -ntomp 4 \
+      -dlb yes -resethway -pin on -v deffnm step5_1 -g test.log
+
+

We strongly suggest that users try combinations of different numbers of nodes, MPI ranks per node, number of GPU tasks/devices, GPU task decomposition between nonbonded and PME kernels, and OMP threads per rank to find the optimal throughput for their particular workload.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/applications-and-libraries/applications/lammps/index.html b/polaris/applications-and-libraries/applications/lammps/index.html new file mode 100644 index 0000000000..af1b15b810 --- /dev/null +++ b/polaris/applications-and-libraries/applications/lammps/index.html @@ -0,0 +1,6915 @@ + + + + + + + + + + + + + + + + + + + + + + + + + LAMMPS - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

LAMMPS

+

Overview

+

LAMMPS is a general-purpose molecular dynamics software package for massively parallel computers. It is written in an exceptionally clean style that makes it one of the more popular codes for users to extend and it currently has dozens of user-developed extensions.

+

For details bout the code and its usage, see the LAMMPS home page. This page provides information specific to running on Polaris at the ALCF.

+

Using LAMMPS at ALCF

+

ALCF provides assistanc with build instructions, compiling executables, submitting jobs, and providing prebuilt binaries (upon request). A collection of Makefiles and submission scripts are available in the ALCF GettingStarted repo here. For questions, contact us at support@alcf.anl.gov.

+

How to Obtain the Code

+

LAMMPS is an open-source code, which can be downloaded from the LAMMPS website.

+

Building on Polaris using KOKKOS package

+

After LAMMPS has been downloaded and unpacked an ALCF filesystem, users should see a directory whose name is of the form lammps-<version>. One should then see the Makefile lammps-<version>/src/MAKE/MACHINES/Makefile.polaris in recent versions that can be used for compilation on Polaris. A copy of the Makefile is also available in the ALCF GettingStarted repo here. For older versions of LAMMPS, you may need to take an existing Makefile (e.g. Makefile.mpi) for your specific version of LAMMPS used and edit the top portion appropratiately to create a new Makefile.polaris files.

+

The top portion of Makefile.polaris_kokkos_nvidia used to build LAMMPS with the KOKKOS package using the NVIDIA compilers is shown as an example.

+
# polaris_nvidia = Flags for NVIDIA A100, NVIDIA Compiler, Cray MPICH, CUDA
+# module load craype-accel-nvidia80
+# make polaris_kokkos_nvidia -j 16
+
+SHELL = /bin/sh
+
+# ---------------------------------------------------------------------
+# compiler/linker settings
+# specify flags and libraries needed for your compiler
+
+KOKKOS_DEVICES = Cuda,OpenMP
+KOKKOS_ARCH = Ampere80
+KOKKOS_ABSOLUTE_PATH = $(shell cd $(KOKKOS_PATH); pwd)
+export NVCC_WRAPPER_DEFAULT_COMPILER = nvc++
+
+CRAY_INC = $(shell CC --cray-print-opts=cflags)
+CRAY_LIB = $(shell CC --cray-print-opts=libs)
+
+CC =        $(KOKKOS_ABSOLUTE_PATH)/bin/nvcc_wrapper
+CCFLAGS =  -g -O3 -mp -DLAMMPS_MEMALIGN=64 -DLAMMPS_BIGBIG
+CCFLAGS += $(CRAY_INC)
+SHFLAGS =   -fPIC
+DEPFLAGS =  -M
+
+LINK =      $(CC)
+LINKFLAGS = $(CCFLAGS)
+LIB = $(CRAY_LIB)
+SIZE =      size
+
+

With the appropriate LAMMPS Makefile in place an executable can be compiled as in the following example, which uses the NVIDIA compilers.

+
module load craype-accel-nvidia80
+cd lammps-<version>/src
+make yes-KOKKOS
+make polaris_kokkos_nvidia -j 16
+
+

Running Jobs on Polaris

+

An example submission script for running a KOKKOS-enabled LAMMPS executable is below as an example. Additional information on LAMMPS application flags and options is described on the LAMMPS website.

+
#!/bin/sh
+#PBS -l select=64:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:15:00
+#PBS -l filesystems=home:grand:eagle
+#PBS -q prod
+#PBS -A Catalyst
+
+export MPICH_GPU_SUPPORT_ENABLED=1
+
+NNODES=`wc -l < $PBS_NODEFILE`
+
+# per-node settings
+NRANKS=4
+NRANKSSOCKET=2
+NDEPTH=8
+NTHREADS=1
+NGPUS=4
+
+NTOTRANKS=$(( NNODES * NRANKS ))
+
+EXE=/home/knight/bin/lammps_polaris_kokkos_nvidia
+EXE_ARG="-in in.reaxc.hns -k on g ${NGPUS} -sf kk -pk kokkos neigh half neigh/qeq full newton on "
+
+# OMP settings mostly to quiet Kokkos messages
+
+MPI_ARG="-n ${NTOTRANKS} --ppn ${NRANKS} --depth=${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} --env OMP_PROC_BIND=spread --env OMP_PLACES=cores"
+
+COMMAND="mpiexec ${MPI_ARG} ${EXE} ${EXE_ARG}"
+echo "COMMAND= ${COMMAND}"
+${COMMAND}
+
+

Performance Notes

+

Some useful information on accelerator packages and expectations can be found on the LAMMPS website here.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/applications-and-libraries/applications/openmm/index.html b/polaris/applications-and-libraries/applications/openmm/index.html new file mode 100644 index 0000000000..3d9602eb04 --- /dev/null +++ b/polaris/applications-and-libraries/applications/openmm/index.html @@ -0,0 +1,6903 @@ + + + + + + + + + + + + + + + + + + + + + + + + + OpenMM - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

OpenMM on Polaris

+

What is OpenMM?

+

OpenMM is a high-performance toolkit for molecular simulations that can be used as a stand-alone application or as a library. It provides a combination of flexibility (through custom forces and integrators), openness, and high-performance (especially on recent GPUs).

+

Using OpenMM at ALCF

+

ALCF offers assistance with building binaries and compiling instructions for OpenMM. For questions, contact us at support@alcf.anl.gov.

+

Building OpenMM using Conda module

+
    +
  1. Update environment +
    $ module load conda/2022-07-19
    +
  2. +
  3. Install OpenMM +
    $ mkdir conda
    +$ conda create --prefix /path-to/conda/openmm_env
    +$ conda activate /path-to/conda/openmm_env
    +$ conda install -c conda-forge openmm cudatoolkit=11.4
    +$ conda deactivate /path-to/conda/openmm_env
    +
  4. +
  5. +

    Validate installation: if successful, then info on code version, platform types, CUDA initialization, and force error tolerance will be shown.

    +
    $ cd /path-to/conda/openmm_env/share/openmm/examples
    +$ python -m openmm.testInstallation
    +
    +
  6. +
  7. +

    Benchmark testing using PBS job script below.

    +
    $ cd /path-to/conda/openmm_env/share/openmm/examples
    +$ qsub ./submit.sh
    +
    +
  8. +
+

Running OpenMM Benchmark on Polaris

+

A sample pbs script follows that will run OpenMM benchmark on one node.

+
#!/bin/sh
+#PBS -l select=1:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -q debug
+#PBS -A PROJECT
+#PBS -l filesystems=home:grand:eagle
+
+cd ${PBS_O_WORKDIR}
+
+module load cudatoolkit-standalone/11.4.4
+
+python benchmark.py --platform=CUDA --test=pme --precision=mixed --seconds=30 --heavy-hydrogens > test.output
+
+

Building OpenMM from Source

+
    +
  1. Update environment +
    $ module load cudatoolkit-standalone/11.4.4
    +$ module load cray-python/3.9.12.1
    +
  2. +
  3. Download OpenMM +
    $ git checkout https://github.com/openmm/openmm.git
    +$ cd openmm ; mkdir build
    +
  4. +
  5. Download and build doxygen +
    $ git clone https://github.com/doxygen/doxygen.git
    +$ cd doxygen ; cmake ; make ; make install ; cd ../
    +
  6. +
  7. Download and install swig in OpenMM directory. +
    $ tar xzf swig-4.0.2.tar.gz
    +$ cd swig-4.0.2
    +$ ./configure --prefix=/path-to/openmm/swig-4.0.2 ; make -j 8 ; make install
    +
  8. +
  9. Build OpenMM +
    $ cmake -DDOXYGEN_EXECUTABLE=/path-to/openmm/doxygen/bin/doxygen \
    +        -DSWIG_EXECUTABLE=/path-to/openmm/swig-4.0.2/bin/swig \
    +        -DCMAKE_INSTALL_PREFIX=/path-to/openmm/build \
    +         -DCUDA_HOME=/soft/compilers/cudatoolkit/cuda-11.4.4 \
    +         -DCUDA_INCLUDE_DIR=/soft/compilers/cudatoolkit/cuda-11.4.4/include \
    +         -DCUDA_LIB_DIR=/soft/compilers/cudatoolkit/cuda-11.4.4/lib64
    +$ make -j 8
    +$ make install
    +
  10. +
  11. +

    Validate installation: if successful, then info on code version, platform types, CUDA initialization, and force error tolerance will be shown.

    +
    $ cd /path-to/openmm/examples
    +$ python -m openmm.testInstallation
    +
    +
  12. +
  13. +

    Benchmark testing using the PBS job script above.

    +
    $ cd /path-to/openmm/examples
    +$ qsub ./submit.sh
    +
    +
  14. +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/applications-and-libraries/applications/vasp/index.html b/polaris/applications-and-libraries/applications/vasp/index.html new file mode 100644 index 0000000000..aef82757c7 --- /dev/null +++ b/polaris/applications-and-libraries/applications/vasp/index.html @@ -0,0 +1,7027 @@ + + + + + + + + + + + + + + + + + + + + + + + + + VASP - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

VASP

+ +

VASP 6.x.x in Polaris (NVHPC+OpenACC+OpenMP+CUDA math+CrayMPI)

+

The Vienna Ab initio Simulation Package (VASP) is a software package for performing electronic structure calculations with periodic boundary conditions. It is most commonly used that to perform density functional theory (DFT) calculations in a planewave basis using the projector augemented wave (PAW) method. A more complete description of VASP can be found here.

+

Users must have a license to use this code on ALCF systems. More information on how to get access to VASP binaries can be found here.

+

General compiling/installing instructions provided by VASP support

+

Instructions and samples of makefile.include could be found in the vasp.at wiki page.

+

The follow makefile.include was tailored for Polaris, originally taken from here.

+
# Precompiler options
+CPP_OPTIONS = -DHOST=\"LinuxNV\" \
+              -DMPI -DMPI_BLOCK=8000 -Duse_collective \
+              -DscaLAPACK \
+              -DCACHE_SIZE=4000 \
+              -Davoidalloc \
+              -Dvasp6 \
+              -Duse_bse_te \
+              -Dtbdyn \
+              -Dqd_emulate \
+              -Dfock_dblbuf \
+              -D_OPENMP \
+              -D_OPENACC \
+              -DUSENCCL -DUSENCCLP2P\
+
+CPP        = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)
+
+FC         = ftn -acc -gpu=cc80 -mp -target-accel=nvidia80
+FCL        = ftn -acc -gpu=cc80 -c++libs -target-accel=nvidia80
+
+FREE       = -Mfree
+
+FFLAGS     = -Mbackslash -Mlarge_arrays
+
+OFLAG      = -fast
+
+DEBUG      = -Mfree -O0 -traceback
+
+# Specify your NV HPC-SDK installation, try to set NVROOT automatically
+NVROOT     =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')
+# ...or set NVROOT manually
+NVHPC      ?= /opt/nvidia/hpc_sdk
+NVVERSION  = 20.9
+#NVROOT     = $(NVHPC)/Linux_x86_64/$(NVVERSION)
+
+# Use NV HPC-SDK provided BLAS and LAPACK libraries
+LIBAOCL=/soft/libraries/aocl/3.2.0
+BLAS       = ${LIBAOCL}/lib/libblis-mt.a
+LAPACK     = ${LIBAOCL}/lib/libflame.a
+
+BLACS      =
+SCALAPACK  =
+#SCALAPACK  = -Mscalapack
+#SCALAPACK  = ${LIBAOCL}/lib/libscalapack.a
+
+CUDA       = -cudalib=cublas,cusolver,cufft,nccl -cuda
+
+LLIBS      = $(SCALAPACK) $(LAPACK) $(BLAS) $(CUDA)
+
+# Software emulation of quadruple precsion
+QD         ?= $(NVROOT)/compilers/extras/qd
+LLIBS      += -L$(QD)/lib -lqdmod -lqd
+INCS       += -I$(QD)/include/qd
+
+#INCS       += -I/usr/include/linux 
+#INCS       += -I/usr/include/c++/7/tr1 
+#INCS       += -I/usr/include/c++/7 
+#INCS       += -I/usr/include/x86_64-linux-gnu/c++/7
+#INCS       += -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-sles15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/
+
+# Use the FFTs from fftw
+FFTW       ?= ${LIBAOCL}
+LLIBS      += -L$(FFTW)/lib -lfftw3 -lfftw3_omp -lomp
+#INCS       += -I/soft/libraries/aocl/3.2.0/include_LP64/
+INCS       += -I$(FFTW)/include
+
+OBJECTS    = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
+
+# Redefine the standard list of O1 and O2 objects
+SOURCE_O1  := pade_fit.o
+SOURCE_O2  := pead.o
+
+# For what used to be vasp.5.lib
+CPP_LIB    = $(CPP)
+FC_LIB     = nvfortran
+CC_LIB     = cc
+CFLAGS_LIB = -O $(INCS) -c++libs -cuda
+FFLAGS_LIB = -O1 -Mfixed
+FREE_LIB   = $(FREE)
+
+OBJECTS_LIB= linpack_double.o getshmem.o
+
+# For the parser library
+#CXX_PARS   = nvc++ --no_warnings -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-sles15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/include/c++/10.2.0/ -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-s
+les15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/include/c++/10.2.0/x86_64-pc-linux-gnu -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-sles15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/lib/gcc
+/x86_64-pc-linux-gnu/10.2.0/include -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-sles15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include-fixed/
+CXX_PARS   = nvc++ --no_warnings 
+
+# Normally no need to change this
+SRCDIR     = ../../src
+BINDIR     = ../../bin
+
+

Setting up compiler and libraries with module

+

The follow modules will update the include and libraries paths used by the Cray compiler wrapper ftn to load additional math libraries for the CPU.

+
module purge
+module load nvhpc/23.3
+module load PrgEnv-nvhpc
+module load cray-libsci
+module load craype-accel-nvidia80
+
+

Compiling VASP

+

Once the modules are loaded and a makefile.include is in the vasp folder, compiling all the object files and binaries is done with:

+
make -j1
+
+

Running VASP in Polaris

+

An example of a submission script could be found here /soft/applications/vasp/submit-polaris2023-2.sh , which would looks something similar to:

+
#!/bin/sh
+#PBS -l select=1:system=polaris  
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -l filesystems=home:grand:eagle
+#PBS -q debug
+#PBS -A Catalyst
+
+module load PrgEnv-nvhpc
+module load cray-libsci
+
+export MPICH_GPU_SUPPORT_ENABLED=1
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS=2
+NDEPTH=4
+NTHREADS=4
+NGPUS=2
+NTOTRANKS=$(( NNODES * NRANKS ))
+
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS} --depth ${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} /path_to_vasp/bin/vasp_std
+
+

Submission scripts should have executable attibutes to be used with qsub script mode.

+
chmod +x example-script.sh
+qsub  example-script.sh
+
+

Known issues versions: >= 6.4.x in Polaris

+
+
    +
  • Undefined MPIX_Query_cuda_support function at linking binary: This function is called in src/openacc.F. The MPIX_Query_cuda_support is not included incray-mpich. One workaround to this +issue is to comment this function call. +See the follow suggested changes marked by !!!!!CHANGE HERE in the file:src/openacc.F
  • +
+
+!!!!!CHANGE HERE 
+-      INTERFACE
+-        INTEGER(c_int) FUNCTION MPIX_Query_cuda_support() BIND(C, name="MPIX_Query_cuda_support")
+-        END FUNCTION
+-      END INTERFACE
+
+       CHARACTER(LEN=1) :: ENVVAR_VALUE
+       INTEGER :: ENVVAR_STAT
+
+       ! This should tell us if MPI is CUDA-aware
++!!!!!CHANGE HERE 
+-       CUDA_AWARE_SUPPORT = MPIX_Query_cuda_support() == 1
++       CUDA_AWARE_SUPPORT = .TRUE.
+       ! However, for OpenMPI some env variables can still deactivate it even though the previous
+       ! check was positive
+       CALL GET_ENVIRONMENT_VARIABLE("OMPI_MCA_mpi_cuda_support", ENVVAR_VALUE, STATUS=ENVVAR_STAT)
+       IF (ENVVAR_STAT==0 .AND. ENVVAR_VALUE=='0') CUDA_AWARE_SUPPORT = .FALSE.
+       CALL GET_ENVIRONMENT_VARIABLE("OMPI_MCA_opal_cuda_support", ENVVAR_VALUE, STATUS=ENVVAR_STAT)
+       IF (ENVVAR_STAT==0 .AND. ENVVAR_VALUE=='0') CUDA_AWARE_SUPPORT = .FALSE.
+       ! Just in case we might be non-OpenMPI, and their MPIX_Query_cuda_support behaves similarly
+       CALL GET_ENVIRONMENT_VARIABLE("MV2_USE_CUDA", ENVVAR_VALUE, STATUS=ENVVAR_STAT)
+       IF (ENVVAR_STAT==0 .AND. ENVVAR_VALUE=='0') CUDA_AWARE_SUPPORT = .FALSE.
+       CALL GET_ENVIRONMENT_VARIABLE("MPICH_RDMA_ENABLED_CUDA", ENVVAR_VALUE, STATUS=ENVVAR_STAT)
+       IF (ENVVAR_STAT==0 .AND. ENVVAR_VALUE=='0') CUDA_AWARE_SUPPORT = .FALSE.
+       CALL GET_ENVIRONMENT_VARIABLE("PMPI_GPU_AWARE", ENVVAR_VALUE, STATUS=ENVVAR_STAT)
+       IF (ENVVAR_STAT==0) CUDA_AWARE_SUPPORT =(ENVVAR_VALUE == '1')
++!!!!!CHANGE HERE 
++       CALL GET_ENVIRONMENT_VARIABLE("MPICH_GPU_SUPPORT_ENABLED", ENVVAR_VALUE, STATUS=ENVVAR_STAT)
++       IF (ENVVAR_STAT==0) CUDA_AWARE_SUPPORT =(ENVVAR_VALUE == '1')
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/applications-and-libraries/libraries/cabana-polaris/index.html b/polaris/applications-and-libraries/libraries/cabana-polaris/index.html new file mode 100644 index 0000000000..2ef81bfae4 --- /dev/null +++ b/polaris/applications-and-libraries/libraries/cabana-polaris/index.html @@ -0,0 +1,6816 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Cabana - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Cabana

+

Cabana

+

Cabana is built atop Kokkos. It provides class templates useful for +implementing particle codes

+

Cabana Documentation

+ +

Cabana on Polaris

+

Built against the prebuilt Kokkos on +polaris, the prebuilt Cabana +includes 3 backends: Serial and OpenMP for CPU execution and CUDA for GPU +execution. To use it, run

+
module use /soft/modulefiles
+module swap PrgEnv-nvhpc PrgEnv-gnu
+module swap gcc/12.2.0 gcc/11.2.0
+module load cudatoolkit-standalone/11.8.0
+module load kokkos
+module load cabana
+
+

(Since the SlingShot 11 upgrade, you must use PrgEnv-gnu and the gcc and +cudatoolkit version changes indicated, at least until some subsequent Polaris +sytem updates have been completed.)

+

Currently, Cabana is a headers-only installation; there are no libraries per se.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/applications-and-libraries/libraries/math-libraries/index.html b/polaris/applications-and-libraries/libraries/math-libraries/index.html new file mode 100644 index 0000000000..9920ddd9fa --- /dev/null +++ b/polaris/applications-and-libraries/libraries/math-libraries/index.html @@ -0,0 +1,6780 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Math Libraries - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Math Libraries

+

BLAS, LAPACK, and ScaLAPACK for CPUs

+

Some math libraries targeting CPUs are made available as part of the nvhpc modules and are based on the OpenBLAS project. Additional documentation is available from NVIDIA.

+
    +
  • BLAS & LAPACK can be found in the $NVIDIA_PATH/compilers/lib directory.
  • +
  • ScaLAPACK can be found in the $NVIDIA_PATH/comm_libs directory.
  • +
  • GNU Scientific Library, GSL-2.7 available as module help gsl
  • +
  • AMD Optiming CPU Libraries, AOCL v4.0 available as module help aocl
  • +
  • Other Cray-based math libs such as Libsci, FFTW were made available by module load cray-libsci & module load cray-fftw
  • +
+

NVIDIA Math Libraries for GPUs

+

Math libraries from NVIDIA are made available via the nvhpc modules. Many of the libraries users typically use can be found in the $NVIDIA_PATH/math_libs directory. Some examples follow and additional documentation is available from NVIDIA.

+
    +
  • libcublas
  • +
  • libcufft
  • +
  • libcurand
  • +
  • libcusolver
  • +
  • libcusparse
  • +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/build-tools/cmake-polaris/index.html b/polaris/build-tools/cmake-polaris/index.html new file mode 100644 index 0000000000..b0cfea967b --- /dev/null +++ b/polaris/build-tools/cmake-polaris/index.html @@ -0,0 +1,6802 @@ + + + + + + + + + + + + + + + + + + + + + + + + + CMake - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

CMake

+

CMake

+

CMake is a build configuration system that uses higher-level description files +to automatically generate Makefiles.

+

CMake Documentation

+ +

CMake on Polaris

+

To use CMake on Polaris, run

+
module use /soft/modulefiles
+module load cmake
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/compiling-and-linking/cce-compilers-polaris/index.html b/polaris/compiling-and-linking/cce-compilers-polaris/index.html new file mode 100644 index 0000000000..5173bb2a1e --- /dev/null +++ b/polaris/compiling-and-linking/cce-compilers-polaris/index.html @@ -0,0 +1,6689 @@ + + + + + + + + + + + + + + + + + + + + + + + + + CCE Compilers - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

CCE Compilers on Polaris

+

The Cray Compiling Environment (CCE) compilers are available on Polaris via the PrgEnv-cray module.

+

The CCE compilers currently on Polaris only support AMD GPU targets for HIP and are thus not usable with the A100 GPUs.

+

The nvhpc and llvm compilers can be used for compiling GPU-enabled applications.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/compiling-and-linking/compiling-and-linking-overview/index.html b/polaris/compiling-and-linking/compiling-and-linking-overview/index.html new file mode 100644 index 0000000000..9c38bff5f3 --- /dev/null +++ b/polaris/compiling-and-linking/compiling-and-linking-overview/index.html @@ -0,0 +1,7003 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Compiling and Linking Overview - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Compiling and Linking Overview on Polaris

+

Compiling on Polaris Login and Compute Nodes

+

If your build system does not require GPUs for the build process, as is usually the case, compilation of GPU-accelerated codes is generally expected to work well on the Polaris login nodes. If your build system does require GPUs, you cannot yet compile on the Polaris login nodes, as they do not currently have GPUs installed. You may in this case compile your applications on the Polaris compute nodes. Do this by submitting an interactive single-node job, or running your build system in a batch job.

+ + +

Home File System

+

Is it helpful to realize that there is a single HOME filesystem for users that can be accessed from the login and computes of each production resource at ALCF. Thus, users should be mindful of modifications to their environments (e.g. .bashrc) that may cause issues to arise due to differences between the systems.

+

An example is creating an alias for the qstat command to, for example, change the order of columns printed to screen. Users with such an alias that works well on Theta may run into issues using qstat on Polaris as the two system use different schedulers: Cobalt (Theta) and PBS (Polaris). Users with such modifications to their environments are encouraged to modify their scripts appropriately depending on $hostname.

+

Cray Programming Environment

+

The Cray Programming Environment (PE) uses three compiler wrappers for building software. These compiler wrappers should be used when building MPI-enabled applications.

+
    +
  • cc - C compiler
  • +
  • CC - C++ compiler
  • +
  • ftn - Fortran compiler
  • +
+

Each of these wrappers can select a specific vendor compiler based on the PrgEnv module loaded in the environment. The following are some helpful options to understand what the compiler wrapper is invoking.

+
    +
  • --craype-verbose : Print the command which is forwarded to the compiler invocation
  • +
  • --cray-print-opts=libs : Print library information
  • +
  • --cray-print-opts=cflags : Print include information
  • +
+

The output from these commands may be useful in build scripts where a compiler other than that invoked by a compiler wrapper is desired. Defining some variables as such may prove useful in those situations. +

CRAY_CFLAGS=$(cc --cray-print-opts=cflags)
+CRAY_LIB=$(cc --cray-print-opts=libs)
+
+Further documentation and options are available via man cc and similar.

+

Compilers provided by Cray Programming Environments

+

The default programming environment on Polaris is currently NVHPC. The GNU compilers are available via another programming environment. The following sequence of module commands can be used to switch to the GNU programming environment (gcc, g++, gfortran) and also have NVIDIA compilers available in your path.

+
module swap PrgEnv-nvhpc PrgEnv-gnu
+module load nvhpc-mixed
+
+

The compilers invoked by the Cray MPI wrappers are listed for each programming environment in the following table.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
moduleCC++Fortran
MPI Compiler WrapperccCCftn
PrgEnv-nvhpcnvcnvc++nvfortran
PrgEnv-gnugccg++gfortran
+

Note, while gcc and g++ may be available in the default environment, the PrgEnv-gnu module is needed to provide gfortran.

+

Additional Compilers Provided by ALCF

+

The ALCF additionally provides compilers to enable the OpenMP and SYCL programming models for GPUs viaLLVM as documented here

+

Additional documentation for using compilers is available on the respective programming model pages: OpenMP and SYCL.

+

Linking

+

Dynamic linking of libraries is currently the default on Polaris. The Cray MPI wrappers will handle this automatically.

+

Notes on Default Modules

+
    +
  • +

    craype-x86-rome: While the Polaris compute nodes currently have Milan CPUs, this module is loaded by default to avoid the craype-x86-milan module from adding a zen3 target not supported in the default nvhpc/21.9 compilers. The craype-x86-milan module is expected to be made default once a newer nvhpc version (e.g. 22.5) is made the default.

    +
  • +
  • +

    craype-accel-nvidia80: This module adds compiler flags to enable GPU acceleration for NVHPC compilers along with gpu-enabled MPI libraries as it is assumed that the majority of applications to be compiled on Polaris will target the GPUs for acceleration. Users building cpu-only applications may find it useful to unload this module to silence "gpu code generation" warnings.

    +
  • +
+

Mixed C/C++ & Fortran Applications

+

For applications consisting of a mix of C/C++ and Fortran that also uses MPI, it is suggested that the programming environment chosen for Fortran be used to build the full application because of mpi.mod (and similar) incompatibilities.

+

Compiling for GPUs

+

It is assumed the majority of applications to be built on Polaris will make use of the GPUs. As such, the craype-accel-nvidia80 module is in the default environment. This has the effect of the Cray compiler wrappers adding -gpu to the compiler invocation along with additional include paths and libraries. Additional compilers flags may be needed depending on the compiler and GPU programming model used (e.g. -cuda, -acc, or -mp=gpu).

+

This module also adds GPU Transport Layer (GTL) libraries to the link-line to support GPU-aware MPI applications.

+

Man Pages

+

For additional information on the Cray wrappers, please refer to the man pages. +

man cc
+man CC
+man ftn
+

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/compiling-and-linking/files/01-ci-002.png b/polaris/compiling-and-linking/files/01-ci-002.png new file mode 100644 index 0000000000..0eac188ed2 Binary files /dev/null and b/polaris/compiling-and-linking/files/01-ci-002.png differ diff --git a/polaris/compiling-and-linking/files/02-ci-003.png b/polaris/compiling-and-linking/files/02-ci-003.png new file mode 100644 index 0000000000..b4e9e5a299 Binary files /dev/null and b/polaris/compiling-and-linking/files/02-ci-003.png differ diff --git a/polaris/compiling-and-linking/files/03-ci-004.png b/polaris/compiling-and-linking/files/03-ci-004.png new file mode 100644 index 0000000000..c35f28f8f1 Binary files /dev/null and b/polaris/compiling-and-linking/files/03-ci-004.png differ diff --git a/polaris/compiling-and-linking/files/04-image005.png b/polaris/compiling-and-linking/files/04-image005.png new file mode 100644 index 0000000000..9d4dae443a Binary files /dev/null and b/polaris/compiling-and-linking/files/04-image005.png differ diff --git a/polaris/compiling-and-linking/files/05-ci-006_1.png b/polaris/compiling-and-linking/files/05-ci-006_1.png new file mode 100644 index 0000000000..8a9d8ecf91 Binary files /dev/null and b/polaris/compiling-and-linking/files/05-ci-006_1.png differ diff --git a/polaris/compiling-and-linking/files/06-ci-007.png b/polaris/compiling-and-linking/files/06-ci-007.png new file mode 100644 index 0000000000..bfbf193dd6 Binary files /dev/null and b/polaris/compiling-and-linking/files/06-ci-007.png differ diff --git a/polaris/compiling-and-linking/files/07-ci-008.png b/polaris/compiling-and-linking/files/07-ci-008.png new file mode 100644 index 0000000000..a052d47809 Binary files /dev/null and b/polaris/compiling-and-linking/files/07-ci-008.png differ diff --git a/polaris/compiling-and-linking/files/08-ci-009.png b/polaris/compiling-and-linking/files/08-ci-009.png new file mode 100644 index 0000000000..9b5084d098 Binary files /dev/null and b/polaris/compiling-and-linking/files/08-ci-009.png differ diff --git a/polaris/compiling-and-linking/files/09-ci-010.png b/polaris/compiling-and-linking/files/09-ci-010.png new file mode 100644 index 0000000000..045df9ea08 Binary files /dev/null and b/polaris/compiling-and-linking/files/09-ci-010.png differ diff --git a/polaris/compiling-and-linking/files/10-ci-011.png b/polaris/compiling-and-linking/files/10-ci-011.png new file mode 100644 index 0000000000..80f6aef0be Binary files /dev/null and b/polaris/compiling-and-linking/files/10-ci-011.png differ diff --git a/polaris/compiling-and-linking/files/11-ci-012.png b/polaris/compiling-and-linking/files/11-ci-012.png new file mode 100644 index 0000000000..e1ca97862c Binary files /dev/null and b/polaris/compiling-and-linking/files/11-ci-012.png differ diff --git a/polaris/compiling-and-linking/files/12-ci-013.png b/polaris/compiling-and-linking/files/12-ci-013.png new file mode 100644 index 0000000000..84c7be70da Binary files /dev/null and b/polaris/compiling-and-linking/files/12-ci-013.png differ diff --git a/polaris/compiling-and-linking/files/13-ci-014.png b/polaris/compiling-and-linking/files/13-ci-014.png new file mode 100644 index 0000000000..804b05d102 Binary files /dev/null and b/polaris/compiling-and-linking/files/13-ci-014.png differ diff --git a/polaris/compiling-and-linking/files/14-ci-015.png b/polaris/compiling-and-linking/files/14-ci-015.png new file mode 100644 index 0000000000..027c65740a Binary files /dev/null and b/polaris/compiling-and-linking/files/14-ci-015.png differ diff --git a/polaris/compiling-and-linking/files/15-ci-016.png b/polaris/compiling-and-linking/files/15-ci-016.png new file mode 100644 index 0000000000..53dd12f8e0 Binary files /dev/null and b/polaris/compiling-and-linking/files/15-ci-016.png differ diff --git a/polaris/compiling-and-linking/files/16-ci-017.png b/polaris/compiling-and-linking/files/16-ci-017.png new file mode 100644 index 0000000000..aa6e584e76 Binary files /dev/null and b/polaris/compiling-and-linking/files/16-ci-017.png differ diff --git a/polaris/compiling-and-linking/gnu-compilers-polaris/index.html b/polaris/compiling-and-linking/gnu-compilers-polaris/index.html new file mode 100644 index 0000000000..6b1206dd24 --- /dev/null +++ b/polaris/compiling-and-linking/gnu-compilers-polaris/index.html @@ -0,0 +1,6689 @@ + + + + + + + + + + + + + + + + + + + + + + + + + GNU Compilers - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

GNU Compilers on Polaris

+

The GNU compilers are available on Polaris via the PrgEnv-gnu and gcc-mixed modules. The gcc-mixed module can be useful when, for example, the PrgEnv-nvhpc compilers are used to compile C/C++ MPI-enabled code and gfortran is needed.

+

The GNU compilers currently on Polaris do not support GPU code generation and thus can only be used for compiling CPU codes.

+

The nvhpc and llvm compilers can be used for compiling GPU-enabled applications.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/compiling-and-linking/llvm-compilers-polaris/index.html b/polaris/compiling-and-linking/llvm-compilers-polaris/index.html new file mode 100644 index 0000000000..1c7bbd8faa --- /dev/null +++ b/polaris/compiling-and-linking/llvm-compilers-polaris/index.html @@ -0,0 +1,6755 @@ + + + + + + + + + + + + + + + + + + + + + + + + + LLVM Compilers - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

LLVM Compilers on Polaris

+

This page is not about LLVM-based Cray Compiling Environment (CCE) compilers from PrgEnv-cray but about open source LLVM compilers. +If LLVM compilers are needed without MPI support, simply load the llvm module.

+

Cray Programming Environment does not offer LLVM compiler support. +Thus cc/CC/ftn compiler wrappers using LLVM compilers currently are not available. +To use Clang with MPI, one can load the mpiwrappers/cray-mpich-llvm module which loads the following modules.

+
    +
  • llvm, upstream llvm compilers
  • +
  • cray-mpich, MPI compiler wrappers mpicc/mpicxx/mpif90. mpif90 uses gfortran because flang is not ready for production use.
  • +
  • cray-pals, MPI launchers mpiexec/aprun/mpirun
  • +
+

Limitation There is no GPU-aware MPI library linking support by default. If needed, users should manually add the GTL (GPU Transport Layer) library to the application link line.

+

OpenMP offload

+

When targeting the OpenMP or CUDA programming models for GPUs, the cudatoolkit-standalone module should also be loaded.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/compiling-and-linking/nvidia-compiler-polaris/index.html b/polaris/compiling-and-linking/nvidia-compiler-polaris/index.html new file mode 100644 index 0000000000..006f830a36 --- /dev/null +++ b/polaris/compiling-and-linking/nvidia-compiler-polaris/index.html @@ -0,0 +1,6872 @@ + + + + + + + + + + + + + + + + + + + + + + + + + NVIDIA Compilers - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+ +
+ + + +
+
+ + + + + + + +

NVIDIA Compilers on Polaris

+

The NVIDIA compilers (nvc, nvc++, nvcc, and nvfortran) are available on Polaris via the PrgEnv-nvhpc and nvhpc modules. There is currently a PrgEnv-nvidia module available, but that will soon be deprecated in Cray's PE, thus it is not recommend for use.

+

The Cray compiler wrappers map to NVIDIA compilers as follows.

+
cc -> nvc
+CC -> nvc++
+ftn -> nvfortran
+
+

Users are encouraged to look through NVIDIA's documentation for the NVHPC SDK and specific information on the compilers, tools, and libraries.

+

Notes on NVIDIA Compilers

+

PGI compilers

+

The NVIDIA programming environments makes available compilers from the NVIDIA HPC SDK. While the PGI compilers are available in this programming environment, it should be noted they are actually symlinks to the corresponding NVIDIA compilers. +

pgcc -> nvc
+pgc++ -> nvc++
+pgf90 -> nvfortran
+pgfortran -> nvfortran
+
+While nvcc is the traditional CUDA C and CUDA C++ compiler for NVIDIA GPUs, the nvc, nvc++, and nvfortran compilers additionally target CPUs.

+

NVHPC SDK Directory Structure

+

Users migrating from CUDA toolkits to the NVHPC SDK may find it beneficial to review the directory structure of the hpc-sdk directory to find the location of commonly used libraries (including math libraries for the CPU). With the PrgEnv-nvhpc module loaded, the NVIDIA_PATH environment variable can be used to locate the path to various NVIDIA tools, libraries, and examples.

+
    +
  • compiler/bin - cuda-gdb, ncu, nsys, ...
  • +
  • examples - CUDA-Fortran, OpenMP, ...
  • +
  • comm_libs - nccl, nvshmem, ...
  • +
  • compiler/libs - blas, lapack, ...
  • +
  • cuda/lib64 - cudart, OpenCL, ...
  • +
  • math_libs/lib64 - cublas, cufft, ...
  • +
+

Differences between nvcc and nvc/nvc++

+

For users that want to continue using nvcc it is important to be mindful of differences with the newer nvc and nvc++ compilers. For example, the -cuda flag instructs nvcc to compile .cu input files to .cu.cpp.ii output files which are to be separately compiled, whereas the same -cuda flag instructs nvc, nvc++, and nvfortran to enable CUDA C/C++ or CUDA Fortran code generation. The resulting output file in each case is different (text vs. object) and one may see unrecognized format error when -cuda is incorrectly passed to nvcc.

+

Known Issues and Workarounds

+

If you are using nvcc to invoke nvc++ and compiling C++17 code, and are seeing the following warning and unable to compile C++17 constructs:

+
polaris-login-01(~)> nvcc --std=c++17 -ccbin nvc++ ~/smalltests/bool_constant.cpp
+nvcc warning : The -std=c++17 flag is not supported with the configured host compiler. Flag will be ignored.
+"/home/zippy/smalltests/bool_constant.cpp", line 10: error: namespace "std" has no member class "bool_constant"
+      : std::bool_constant<(UnaryPred<Ts>::value || ...)> {};
+             ^
+
+"/home/zippy/smalltests/bool_constant.cpp", line 10: error: class or struct definition is missing
+      : std::bool_constant<(UnaryPred<Ts>::value || ...)> {};
+                          ^
+
+2 errors detected in the compilation of "/home/zippy/smalltests/bool_constant.cpp".
+polaris-login-01(~)>
+
+

you will need to work around it by loading the latest cudatoolkit module atop PrgEnv-nvhpc:

+
module load cudatoolkit-standalone/11.6.2
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/compiling-and-linking/oneapi-compiler/index.html b/polaris/compiling-and-linking/oneapi-compiler/index.html new file mode 100644 index 0000000000..a890e94ccf --- /dev/null +++ b/polaris/compiling-and-linking/oneapi-compiler/index.html @@ -0,0 +1,6865 @@ + + + + + + + + + + + + + + + + + + + + + + + + + oneAPI Toolkit - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

oneAPI Compilers and Support

+

The Intel oneAPI compiler and Codeplay plugins for Nvidia GPUs are available on Polaris. +The oneAPI compilers are not enabled under the Cray Programming Environment system but can be used separately. +Two oneAPI variants are provided, the first being a "release" version based on Intel's officially released oneAPI toolkit. +Intel Release Notes

+
+

Note

+

The 2023.2.1 release of oneAPI Toolkit does not yet support oneDPL on Nvidia devices. Though oneMKL is now added to 2023.2.1 release onwards

+
+

Components

+
    +
  • These are the list of components associated with this module
  • +
+ + + + + + + + + + + + + + + + + +
User ApplicationComponent
CompilersDPC++
oneMKL InterfacesoneMKL
+

The other variant being a build from the open-source. This variant will be more up-to-date at the risk of bugs and breakages based on code that has not undergone a full release cycle. +The documentation is located on the SYCL page. Most notable differences being, icx/icpx are the names of C/C++ compilers respectively when using the release version of the module where as clang/clang++ are for open-source variant.

+ +

oneAPI uses the clang (or icx/icpx wrapper) for compiling and linking for the Nvidia A100 SM80 architecture.

+
module load oneapi/release
+icpx -std=c++17 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 test.cpp
+
+
harms@polaris-login-04:~/working/polaris/oneapi> icpx -v
+Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230721)
+Target: x86_64-unknown-linux-gnu
+Thread model: posix
+InstalledDir: /soft/compilers/oneapi/release/2023.2/compiler/2023.2.1/linux/bin-llvm
+Configuration file: /soft/compilers/oneapi/release/2023.2/compiler/2023.2.1/linux/bin-llvm/../bin/icpx.cfg
+Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/7
+Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/7
+Candidate multilib: .;@m64
+Selected multilib: .;@m64
+Found CUDA installation: /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/cuda/11.4, version 11.4
+
+

Running

+

The library should select the GPU by default, but selection of the GPUs can be forced via the ONEAPI_DEVICE_SELECTOR +

$ ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu ./a.out
+
+or a specific GPU. +
$ ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu:3 ./a.out
+

+

sycl-ls

+

Expected output of sycl-ls and which platforms are available.

+
harms@x3004c0s7b0n0:~> which sycl-ls
+/soft/compilers/oneapi/release/2023.2/compiler/2023.2.1/linux/bin/sycl-ls
+
+harms@x3004c0s7b0n0:~> sycl-ls
+[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.7.0.21_160000]
+[opencl:cpu:1] Intel(R) OpenCL, AMD EPYC 7543P 32-Core Processor                3.0 [2023.16.7.0.21_160000]
+[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-40GB 8.8 [CUDA 11.4]
+[ext_oneapi_cuda:gpu:1] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-40GB 8.8 [CUDA 11.4]
+[ext_oneapi_cuda:gpu:2] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-40GB 8.8 [CUDA 11.4]
+[ext_oneapi_cuda:gpu:3] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-40GB 8.8 [CUDA 11.4]
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/compiling-and-linking/polaris-example-program-makefile/index.html b/polaris/compiling-and-linking/polaris-example-program-makefile/index.html new file mode 100644 index 0000000000..ea0145ade1 --- /dev/null +++ b/polaris/compiling-and-linking/polaris-example-program-makefile/index.html @@ -0,0 +1,6914 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Example Program and Makefile - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Example Programs and Makefiles for Polaris

+

Several simple examples of building CPU and GPU-enabled codes on Polaris are available in the ALCF GettingStart repo for several programming models. If build your application is problematic for some reason (e.g. absence of a GPU), then users are encouraged to build and test applications directly on one of the Polaris compute nodes via an interactive job. The discussion below makes use of the NVHPC compilers in the default environment as illustrative examples. Similar examples for other compilers on Polaris are available in the ALCF GettingStarted repo.

+

CPU MPI+OpenMP Example

+

One of the first useful tasks with any new machine, scheduler, and job launcher is to ensure one is binding MPI ranks and OpenMP threads to the host cpu as intended. A simple HelloWorld MPI+OpenMP example is available here to get started with.

+

The application can be straightforwardly compiled using the Cray compiler wrappers. +

CC -fopenmp main.cpp -o hello_affinity
+

+

The executable hello_affinity can then be launched in a job script (or directly in shell of interactive job) using mpiexec as discussed here.

+
#!/bin/sh
+#PBS -l select=1:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -l filesystems=home
+
+# MPI example w/ 16 MPI ranks per node spread evenly across cores
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS_PER_NODE=16
+NDEPTH=4
+NTHREADS=1
+
+NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
+echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
+
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth ./hello_affinity
+
+

CUDA

+

Several variants of C/C++ and Fortran CUDA examples are available here that include MPI and multi-gpu examples.

+

One can use the Cray compiler wrappers to compile GPU-enabled applications as well. This example of simple vector addition uses the NVIDIA compilers.

+
CC -g -O3 -std=c++0x -cuda main.cpp -o vecadd
+
+

The craype-accel-nvidia80 module in the default environment will add the -gpu compiler flag for nvhpc compilers along with appropriate include directories and libraries. It is left to the user to provide an additional flag to the nvhpc compilers to select the target GPU programming model. In this case, -cuda is used to indicate compilation of CUDA code. The application can then be launched within a batch job submission script or as follows on one of the compute nodes.

+
$ ./vecadd 
+# of devices= 4
+  [0] Platform[ Nvidia ] Type[ GPU ] Device[ NVIDIA A100-SXM4-40GB ]
+  [1] Platform[ Nvidia ] Type[ GPU ] Device[ NVIDIA A100-SXM4-40GB ]
+  [2] Platform[ Nvidia ] Type[ GPU ] Device[ NVIDIA A100-SXM4-40GB ]
+  [3] Platform[ Nvidia ] Type[ GPU ] Device[ NVIDIA A100-SXM4-40GB ]
+Running on GPU 0!
+Using single-precision
+
+  Name= NVIDIA A100-SXM4-40GB
+  Locally unique identifier= 
+  Clock Frequency(KHz)= 1410000
+  Compute Mode= 0
+  Major compute capability= 8
+  Minor compute capability= 0
+  Number of multiprocessors on device= 108
+  Warp size in threads= 32
+  Single precision performance ratio= 2
+
+Result is CORRECT!! :)
+
+

GPU OpenACC

+

A simple MPI-parallel OpenACC example is available here. Compilation proceeds similar to the above CUDA example except for the use of the -acc=gpu compiler flag to indicate compilation of OpenACC code for GPUs. +

CC -g -O3 -std=c++0x -acc=gpu -gpu=cc80,cuda11.0 main.cpp -o vecadd
+
+In this example, each MPI rank sees all four GPUs on a Polaris node and GPUs are bound to MPI ranks round-robin within the application.

+

$ mpiexec -n 4 ./vecadd
+# of devices= 4
+Using single-precision
+
+Rank 0 running on GPU 0!
+Rank 1 running on GPU 1!
+Rank 2 running on GPU 2!
+Rank 3 running on GPU 3!
+
+Result is CORRECT!! :)
+
+If the application instead relies on the job launcher to bind MPI ranks to available GPUs, then a small helper script can be used to explicitly set CUDA_VISIBLE_DEVICES appropriately for each MPI rank. One example is available here where each MPI rank is similarly bound to a single GPU with round-robin assignment. The binding of MPI ranks to GPUs is discussed in more detail here.

+

GPU OpenCL

+

A simple OpenCL example is available here. The OpenCL headers and library are available in the NVHPC SDK and cuda toolkits. The environment variable NVIDIA_PATH is defined for the PrgEnv-nvhpc programming environment. +

CC -o vecadd -g -O3 -std=c++0x  -I${NVIDIA_PATH}/cuda/include main.o -L${NVIDIA_PATH}/cuda/lib64 -lOpenCL
+

+

This simple example can be run on a Polaris compute node as follows. +

$ ./vecadd
+Running on GPU!
+Using single-precision
+
+    CL_DEVICE_NAME: NVIDIA A100-SXM4-40GB
+    CL_DEVICE_VERSION: OpenCL 3.0 CUDA
+    CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 
+    CL_DEVICE_MAX_COMPUTE_UNITS: 108
+    CL_DEVICE_MAX_CLOCK_FREQUENCY: 1410
+    CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
+
+Result is CORRECT!! :)
+

+

GPU OpenMP

+

A simple MPI-parallel OpenMP example is available here. Compilation proceeds similar to the above examples except for use of the -mp=gpu compiler flag to indicated compilation of OpenMP code for GPUs.

+
CC -g -O3 -std=c++0x -mp=gpu -gpu=cc80,cuda11.0 -c main.cpp -o vecadd
+
+

Similar to the OpenACC example above, this code binds MPI ranks to GPUs in a round-robin fashion. +

$ mpiexec -n 4 ./vecadd
+# of devices= 4
+Rank 0 running on GPU 0!
+Rank 1 running on GPU 1!
+Rank 2 running on GPU 2!
+Rank 3 running on GPU 3!
+
+Result is CORRECT!! :)
+

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/compiling-and-linking/polaris-programming-models/index.html b/polaris/compiling-and-linking/polaris-programming-models/index.html new file mode 100644 index 0000000000..f93be3befe --- /dev/null +++ b/polaris/compiling-and-linking/polaris-programming-models/index.html @@ -0,0 +1,6966 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Programming Models - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+ +
+
+ + + +
+
+ + + + + + + +

Programming Models on Polaris

+

The software environment on Polaris supports several parallel programming models targeting the CPUs and GPUs.

+

CPU Parallel Programming Models

+

The Cray compiler wrappers cc, CC, and ftn are recommended for MPI applications as they provide the needed include paths and libraries for each programming environment. A summary of available CPU parallel programming models and relevant compiler flags is shown below. Users are encouraged to review the corresponding man pages and documentation.

+ + + + + + + + + + + + + + + + + + + + + + + +
Programming ModelGNUNVHPCLLVM
OpenMP-fopenmp-mp-fopenmp
OpenACC---acc=multicore--
+

Higher-level programming models such as Kokkos and Raja may also be used for CPU programming.

+

GPU Programming Models

+

A summary of available GPU programming models and relevant compiler flags is shown below for compilers that generate offloadable code. Users are encouraged to review the corresponding man pages and documentation.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Programming ModelGNUNVHPCLLVMONEAPI
CUDA---cuda [-gpu=cuda8.0,cc11.0]----
HIP*--------
OpenACC---acc----
OpenCL*--------
OpenMP---mp=gpu-fopenmp-targets=nvptx64--
SYCL-------fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80
+

Note, the llvm and oneapi modules are provided by ALCF to complement the compilers provided by the Cray PE on Polaris.

+

Higher-level programming models such as Kokkos and Raja may also be used for GPU programming.

+

OpenCL is supported, but does not require specific compiler flags per-se as the offloaded kernels are just-in-time compiled. Abstraction programming models, such as Kokkos, can be built on top of some of these programming models (see below).

+

A HIP compiler supporting the A100 GPUs is still to be installed on Polaris.

+

Mapping Programming Models to Polaris Modules

+

The table below offers some suggestions for how to get started setting up your environment on Polaris depending on the programming language and model. Note, mixed C/C++ and Fortran applications should choose the programming environment for the Fortran compiler because of mpi.mod and similar incompatibilities between Fortran-generated files from different compilers. Several simple examples for testing the software environment on Polaris for different programming models are available in the ALCF GettingStart repo.

+

Note, users are encouraged to use PrgEnv-nvhpc instead of PrgEnv-nvidia as the latter will soon be deprecated in Cray's PE. They are otherwise identical pointing to compilers from the same NVIDIA SDK version.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Programming LanguageGPU Programming ModelLikely used Modules/CompilersNotes
C/C++CUDAPrgEnv-nvhpc, PrgEnv-gnu, llvmNVIDIA (nvcc, nvc, nvc++) and clang compilers do GPU code generation
C/C++HIPN/Aneed to install with support for A100
C/C++KokkosSee CUDAHIP, OpenMP, and SYCL/DPC++ also candidates
C/C++OpenACCPrgEnv-nvhpc
C/C++OpenCLPrgEnv-nvhpc, PrgEnv-gnu, llvmJIT GPU code generation
C/C++OpenMPPrgEnv-nvhpc, llvm
C/C++RAJASee CUDAHIP, OpenMP, and SYCL/DPC++ also candidates
C/C++SYCL/DPC++llvm-sycl
FortranCUDAPrgEnv-nvhpcNVIDIA compiler (nvfortran) does GPU code generation; gfortran can be loaded via gcc-mixed
FortranHIPN/Aneed to install with support for A100
FortranOpenACCPrgEnv-nvhpc
FortranOpenCLPrgEnv-nvhpc, PrgEnv-gnuJIT GPU code generation
FortranOpenMPPrgEnv-nvhpc
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/data-science-workflows/applications/gpt-neox/index.html b/polaris/data-science-workflows/applications/gpt-neox/index.html new file mode 100644 index 0000000000..74ebd8d43c --- /dev/null +++ b/polaris/data-science-workflows/applications/gpt-neox/index.html @@ -0,0 +1,6765 @@ + + + + + + + + + + + + + + + + + + + + + + + + + gpt-neox - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Instructions for gpt-neox:

+

We include below a set of instructions to get EleutherAI/gpt-neox running on Polaris.

+

A batch submission script for the following example is available here.

+
+

Warning

+

The instructions below should be ran directly from a compute node.

+

Explicitly, to request an interactive job (from polaris-login): +

$ qsub -A <project> -q debug-scaling -l select=2 -l walltime=01:00:00
+

+

Refer to job scheduling and execution for additional information.

+
+
    +
  1. +

    Load and activate the base conda environment: +

    module load conda
    +conda activate base
    +

    +
  2. +
  3. +

    We've installed the requirements for running gpt-neox into a virtual + environment. To activate this environment, +

    source /soft/datascience/venvs/polaris/2022-09-08/bin/activate
    +

    +
  4. +
  5. +

    Clone the EleutherAI/gpt-neox repository if it doesn't already exist: +

    git clone https://github.com/EleutherAI/gpt-neox
    +

    +
  6. +
  7. +

    Navigate into the gpt-neox directory: +

    cd gpt-neox
    +
    +
    +
    +

    Note

    +

    The remaining instructions assume you're inside the gpt-neox directory +

    +
    +

    +
  8. +
  9. +

    Create a DeepSpeed compliant hostfile (each line is formatted as hostname, slots=N): +

    cat $PBS_NODEFILE > hostfile
    +sed -e 's/$/ slots=4/' -i hostfile
    +export DLTS_HOSTFILE=hostfile 
    +

    +
  10. +
  11. +

    Create a .deepspeed_env file to ensure a consistent environment across all + workers +

    echo "PATH=${PATH} > .deepspeed_env"
    +echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH} >> .deepspeed_env"
    +echo "http_proxy=${http_proxy} >> .deepspeed_env"
    +echo "https_proxy=${https_proxy} >> .deepspeed_env"
    +

    +
  12. +
  13. +

    Prepare data: +

    python3 prepare_data.py -d ./data
    +

    +
  14. +
  15. +

    Train: +

    python3 ./deepy.py train.py -d configs small.yml local_setup.yml
    +

    +
  16. +
+
+
+

Danger

+

If your training seems to be getting stuck at

+
Using /home/user/.cache/torch_extensions as PyTorch extensions root...
+
+

there may be a leftover .lock file from an aborted build. Cleaning +either the whole .cache or the extensions' sub-directory should force a +clean build on the next attempt.

+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/data-science-workflows/applications/megatron-deepspeed/index.html b/polaris/data-science-workflows/applications/megatron-deepspeed/index.html new file mode 100644 index 0000000000..6d4882be9d --- /dev/null +++ b/polaris/data-science-workflows/applications/megatron-deepspeed/index.html @@ -0,0 +1,6848 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Megatron-DeepSpeed - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Megatron-DeepSpeed

+

We describe below the instructions for launching distributed training with +Microsoft's Megatron-DeepSpeed and briefly describe some parallelism +strategies and various optimizations that are supported.

+
+

Note

+

We maintain a forked version at +argonne-lcf/Megatron-DeepSpeed +that has some helper scripts for launching and setting +various training options.

+
+

Setup

+
    +
  1. +

    Load conda and activate base environment:

    +
    # load conda + activate base env
    +module load conda/2023-10-04 ; conda activate base
    +
    +
  2. +
  3. +

    Clone + argonne-lcf/Megatron-DeepSpeed + and navigate into it:

    +
    # clone + navigate into Megatron-DeepSpeed repo
    +git clone https://github.com/argonne-lcf/Megatron-DeepSpeed
    +cd Megatron-DeepSpeed
    +
    +
  4. +
  5. +

    Make virtual environment (on top of base conda):

    +
    # make virtual environment (on top of base conda)
    +mkdir -p venvs/polaris/2023-10-04
    +python3 -m venv venvs/polaris/2023-10-04 --system-site-packages
    +source venvs/polaris/2023-10-04/bin/activate
    +
    +
  6. +
  7. +

    Install missing dependency:

    +
    # install *missing dependency
    +python3 -m pip install "git+https://github.com/saforem2/ezpz"
    +
    +
  8. +
  9. +

    Launch training:

    +
    # ---- launch training -----------------------
    +# - MODEL_SIZE_KEY: defined in ALCF/model.sh
    +# - other args: defined in ALCF/args.sh
    +# ---------------------------------------------
    +MODEL_SIZE_KEY="GPT25B" \
    +    SEQ_LEN=4096 \ 
    +    USE_FLASH_ATTN_V2=1 \
    +    MICRO_BATCH=1 \
    +    GAS=1 \
    +    SP_TYPE="megatron" \
    +    ZERO_STAGE=1 \
    +    ./ALCF/train-gpt3.sh
    +
    +
  10. +
+

Helper Scripts

+
+
ALCF/train-gpt3.sh
+
+

Main entry point for training. This script will automatically source the +rest of the required ALCF/*.sh scripts below

+
+
ALCF/model.sh
+
+

Contains some example model architectures for GPT3-style models

+
+
ALCF/args.sh
+
+

Logic for parsing / setting up runtime options for Megatron and DeepSpeed.

+
+
ALCF/setup.sh
+
+

Locate and activate virtual environment to be used, ensure MPI +variables are set properly

+
+
ALCF/launch.sh
+
+

Identify available resources and build the command to be ran i.e. +figure out how many: {nodes, GPUs per node, GPUs total}, to pass to +mpi{run,exec} then, use this to build mpiexec <mpiexec-args> python3 +pretrain_gpt.py

+
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/data-science-workflows/containers/containers/index.html b/polaris/data-science-workflows/containers/containers/index.html new file mode 100644 index 0000000000..57db36d8d8 --- /dev/null +++ b/polaris/data-science-workflows/containers/containers/index.html @@ -0,0 +1,7020 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Containers - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Containers on Polaris

+

Since Polaris is using NVIDIA A100 GPUs, there can be portability advantages with other NVIDIA-based systems if your workloads use containers. In this document, we'll outline some information about containers on Polaris including how to build custom containers, how to run containers at scale, and common gotchas.

+

Container creation can be achieved one of two ways either by using Docker on your local machine as mentioned in Docker section of Theta(KNL) and publishing it to DockerHub, or by using a Singularity recipe file and building on a Polaris worker node. If you are not interested in building a container and only want to use the available containers, you can read the section on available containers.

+

Singularity

+

The container system on Polaris is singularity. You can set up singularity with a module (this is different than, for example, ThetaGPU!):

+
# To see what versions of singularity are available:
+module avail singularity
+
+# To load the Default version:
+module load singularity
+
+# To load a specific version:
+module load singularity/3.8.7 # the default at the time of writing these docs.
+
+

Which Singularity?

+

There used to be a single singularity tool, which in 2021 split after some turmoil. There are now two singularitys: one developed by Sylabs, and the other as part of the Linux Foundation. Both are open source, and the split happened around version 3.10. The version on Polaris is from Sylabs but for completeness, here is the Linux Foundation's version. Note that the Linux Foundation version is renamed to apptainer - different name, roughly the same thing though divergence may happen after 2021's split.

+

Build from Docker Images or Argonne Github container registry

+

Docker containers require root privileges, which users do not have on Polaris. That doesn't mean all your docker containers aren't useful, though. If you have an existing docker container, you can convert it to singularity pretty easily on the login node. To build the latest NVIDIA container for PyTorch you can run the following:

+
module load singularity
+singularity build pytorch:22.06-py3.sing docker://nvcr.io/nvidia/pytorch:22.06-py3
+
+

Note that latest here mean when these docs were written, summer 2022. It may be useful to get a newer container if you need the latest features. You can find the PyTorch container site here. The tensorflow containers are here (though note that LCF doesn't prebuild the TF-1 containers typically). You can search the full container registry here.

+

You can also use our custom built containers using Github OCI container registry. Here's a list of containers distributed by ALCF staff tailored for Polaris.

+
module load singularity
+singularity pull IMAGE_NAME oras://ghcr.io/argonne-lcf/IMAGE_NAME:latest
+
+

Build with a Recipe

+

You can also build a singularity container using a recipe file. Detailed instructions for recipe construction are available on the Singularity Recipe Page. You can also check our singularity recipe example for building a mpich version 4 container on Polaris.

+

Once you have a recipe file, you can build it on Polaris, but only on compute nodes. You can launch an interactive job using the attribute singularity_fakeroot=true to build on a compute node.

+
qsub -I -A <project_name> -q <queue> -l select=1 -l walltime=60:00 -l singularity_fakeroot=true -l filesystems=home:eagle:grand
+
+

You need to replace the <project_name> with the appropriate project to charge and <queue> with debug, or preemptable queues since we only request a single node.

+

After your interactive job has started, you need to load the singularity module on the compute node and export the proxy variables for internet access. Then you can build the container as shown below.

+
module load singularity
+export HTTP_PROXY=http://proxy.alcf.anl.gov:3128
+export HTTPS_PROXY=http://proxy.alcf.anl.gov:3128
+export http_proxy=http://proxy.alcf.anl.gov:3128
+export https_proxy=http://proxy.alcf.anl.gov:3128
+singularity build --fakeroot <image_name>.sif <def_filename>.def 
+
+

Alternatively, you can just pull the mpich 4 image distributed by us and build on top of it

+
singularity pull oras://ghcr.io/argonne-lcf/mpich-4:latest
+
+

Running Singularity container on Polaris

+

Example submission script on Polaris

+

To run a container on Polaris you can use the submission script described here. Below we have described the submission script for your understanding.

+

First we define our job and our script takes the container name as an input parameter.

+
#!/bin/sh
+#PBS -l select=2:system=polaris
+#PBS -q debug
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -l filesystems=home:grand
+#PBS -A <project_name>
+cd ${PBS_O_WORKDIR}
+echo $CONTAINER
+
+

We move to current working directory and enable network access at run time by setting the proxy. We also load singularity.

+
# SET proxy for internet access
+module load singularity
+export HTTP_PROXY=http://proxy.alcf.anl.gov:3128
+export HTTPS_PROXY=http://proxy.alcf.anl.gov:3128
+export http_proxy=http://proxy.alcf.anl.gov:3128
+export https_proxy=http://proxy.alcf.anl.gov:3128
+
+

This is important for system (Polaris - Cray) mpich to bind to containers mpich. Set the following environment variables

+
ADDITIONAL_PATH=/opt/cray/pe/pals/1.1.7/lib/
+module load cray-mpich-abi
+export SINGULARITYENV_LD_LIBRARY_PATH="$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH:$ADDITIONAL_PATH"
+
+

Set the number of ranks per node spread as per your scaling requirements

+
# MPI example w/ 16 MPI ranks per node spread evenly across cores
+NODES=`wc -l < $PBS_NODEFILE`
+PPN=16
+PROCS=$((NODES * PPN))
+echo "NUM_OF_NODES= ${NODES} TOTAL_NUM_RANKS= ${PROCS} RANKS_PER_NODE= ${PPN}"
+
+

Finally launch your script

+
echo C++ MPI
+mpiexec -hostfile $PBS_NODEFILE -n $PROCS -ppn $PPN singularity exec -B /opt -B /var/run/palsd/ $CONTAINER /usr/source/mpi_hello_world
+
+echo Python MPI
+mpiexec -hostfile $PBS_NODEFILE -n $PROCS -ppn $PPN singularity exec -B /opt -B /var/run/palsd/ $CONTAINER python3 /usr/source/mpi_hello_world.py
+
+

The job can be submitted using:

+
qsub -v CONTAINER=mpich-4_latest.sif job_submission.sh
+
+

Available containers

+

If you just want to know what containers are available, here you go.

+
    +
  • For running mpich/MPI containers on Polaris, it can be found here
  • +
  • For running databases on Polaris. It can be found here
  • +
  • For using shpc - that allows for running containers as modules. It can be found here
  • +
  • Some containers are found in /soft/containers
  • +
+

The latest containers are updated periodically. If you have trouble using containers, or request a newer or a different container please contact ALCF support at support@alcf.anl.gov.

+

Troubleshooting

+
    +
  1. +

    Permission Denied Error: One may get a permission denied error during the build process, due to a nasty permission setting, quota limitations, or simply due to an unresolved symbolic link. You can try one of the solutions below:

    +
      +
    • Check your quota and delete any unnecessary files.
    • +
    • Clean-up singularity cache, ~/.singularity/cache, and set the singularity tmp and cache directories as below: +
      export SINGULARITY_TMPDIR=/tmp/singularity-tmpdir
      +mkdir $SINGULARITY_TMPDIR
      +export SINGULARITY_CACHEDIR=/tmp/singularity-cachedir/
      +mkdir $SINGULARITY_CACHEDIR
      +
    • +
    • Make sure you are not on a directory accessed with a symlink, i.e. check if pwd and pwd -P returns the same path.
    • +
    • If any of the above doesn't work, try running the build in your home directory.
    • +
    +
  2. +
  3. +

    Mapping to rank 0 on all nodes: This is mainly due to container mpich not binding to system mpich. It is imperative for the container to have mpich which can bind dynamically to system mpich at runtime. Ensure your submission script has the following variables and modules loaded (see below). If this does not resolve, ensure the containers mpich is built with the '--disable-wrapper-rpath' flag. Please refer to this link to find examples of building a mpich based container from scratch and running on Polaris.

    +
  4. +
+
ADDITIONAL_PATH=/opt/cray/pe/pals/1.1.7/lib/
+module load cray-mpich-abi
+export SINGULARITYENV_LD_LIBRARY_PATH="$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH:$ADDITIONAL_PATH"
+singularity exec -B /opt -B /var/run/palsd/
+
+
    +
  1. +

    libmpi.so.40 not found: This may be due to mpich binding to the wrong system mpich. Try removing .conda & .cache & .local folders from your home directory. Also rebuild your container and try again.

    +
  2. +
  3. +

    Containers built with openmpi may not work correctly. Please ensure your container is built with mpich and the base image is of Debian architecture (For e.g. Ubuntu) image.

    +
  4. +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/data-science-workflows/frameworks/deepspeed/index.html b/polaris/data-science-workflows/frameworks/deepspeed/index.html new file mode 100644 index 0000000000..bb98cf900e --- /dev/null +++ b/polaris/data-science-workflows/frameworks/deepspeed/index.html @@ -0,0 +1,6876 @@ + + + + + + + + + + + + + + + + + + + + + + + + + DeepSpeed - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

DeepSpeed

+

The base conda environment on Polaris comes with Microsoft's +DeepSpeed pre-installed. Instructions +for using / cloning the base environment can be found here.

+

A batch submission script for the following example is available +here.

+

We describe below the steps needed to get started with DeepSpeed on Polaris.

+

We focus on the cifar example provided in the +DeepSpeedExamples repository, +though this approach should be generally applicable for running any model with +DeepSpeed support.

+

Running DeepSpeed on Polaris

+
+

Note

+

The instructions below should be ran directly from a compute node.

+

Explicitly, to request an interactive job (from polaris-login): +

qsub -A <project> -q debug-scaling -l select=2 -l walltime=01:00:00 -I
+

+

Refer to job scheduling and +execution for +additional information.

+
+
    +
  1. +

    Load conda module and activate base environment:

    +
    module load conda ; conda activate base
    +
    +
  2. +
  3. +

    Clone + microsoft/DeepSpeedExamples + and navigate into the directory:

    +
    git clone https://github.com/microsoft/DeepSpeedExamples.git
    +cd DeepSpeedExamples/cifar
    +
    +
  4. +
+
+

Launching DeepSpeed

+
+
+
+
    +
  1. +

    Get total number of available GPUs:

    +
      +
    1. Count number of lines in $PBS_NODEFILE (1 host per line)
    2. +
    3. Count number of GPUs available on current host
    4. +
    5. NGPUS="$((${NHOSTS}*${NGPU_PER_HOST}))" +
      NHOSTS=$(wc -l < "${PBS_NODEFILE}")
      +NGPU_PER_HOST=$(nvidia-smi -L | wc -l)
      +NGPUS="$((${NHOSTS}*${NGPU_PER_HOST}))"
      +
    6. +
    +
  2. +
  3. +

    Launch with mpiexec: +

    mpiexec \
    +  --verbose \
    +  --envall \
    +  -n "${NGPUS}" \
    +  --ppn "${NGPU_PER_HOST}" \
    +  --hostfile="${PBS_NODEFILE}" \
    +  python3 \
    +    cifar10_deepspeed.py \
    +    --deepspeed_config ds_config.json
    +

    +
  4. +
+
+
+
    +
  1. +

    Create a DeepSpeed compliant hostfile, specifying the hostname and + number of GPUs (slots) for each of our available workers: +

    cat $PBS_NODEFILE > hostfile
    +sed -e 's/$/ slots=4/' -i hostfile
    +

    +
  2. +
  3. +

    Create a .deepspeed_env containing the environment + variables our workers will need access to: +

    echo "PATH=${PATH}" >> .deepspeed_env
    +echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" >> .deepspeed_env
    +echo "http_proxy=${http_proxy}" >> .deepspeed_env
    +echo "https_proxy=${https_proxy}" >> .deepspeed_env
    +

    +
  4. +
+
+

Warning

+

The .deepspeed_env file expects each line to be of the form +KEY=VALUE. Each of these will then be set as environment +variables on each available worker specified in our hostfile.

+
+

We can then run the cifar10_deepspeed.py module using DeepSpeed: +

deepspeed --hostfile=hostfile cifar10_deepspeed.py \
+    --deepspeed \
+    --deepspeed_config ds_config.json
+

+
+
+
+
+
+AssertionError: Micro batch sizer per gpu: 0 has to be greater than 0 +

Depending on the details of your specific job, it may be necessary to +modify the provided ds_config.json.

+

If you encounter an error: +

x3202c0s31b0n0: AssertionError: Micro batch size per gpu: 0 has to be greater than 0
+
+you can modify the "train_batch_size": 16 variable in the provided +ds_config.json to the (total) number of available GPUs, and explicitly +set "gradient_accumulation_steps": 1, as shown below. +
$ export NHOSTS=$(wc -l < "${PBS_NODEFILE}")
+$ export NGPU_PER_HOST=$(nvidia-smi -L | wc -l)
+$ export NGPUS="$((${NHOSTS}*${NGPU_PER_HOST}))"
+$ echo $NHOSTS $NGPU_PER_HOST $NGPUS
+24 4 96
+$ # replace "train_batch_size" with $NGPUS in ds_config.json
+$ # and write to `ds_config-polaris.json`
+$ sed \
+    "s/$(cat ds_config.json| grep batch | cut -d ':' -f 2)/ ${NGPUS},/" \
+    ds_config.json \
+    > ds_config-polaris.json
+$ cat ds_config-polaris.json
+{
+    "train_batch_size": 96,
+    "gradient_accumulation_steps": 1,
+    ...
+}
+

+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/data-science-workflows/frameworks/jax/index.html b/polaris/data-science-workflows/frameworks/jax/index.html new file mode 100644 index 0000000000..d52f7c11c9 --- /dev/null +++ b/polaris/data-science-workflows/frameworks/jax/index.html @@ -0,0 +1,6847 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Jax - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

JAX

+

JAX is another popular python package for accelerated computing. JAX is built on XLA (the same XLA TensorFlow uses) as well as AutoGrad, and additionally has acceleration tools that operate on functions such as vmap, jit, etc. JAX is not as widespread in machine learning as TensorFlow and PyTorch for traditional models (Computer Vision, Language Models) though it is quickly gaining promienence. JAX is very powerful when a program needs non-traditional autodifferentiation or vectorizatoin, such as: forward-mode AD, higher order derivatives, Jacobians, Hessians, or any combination of the above. Users of JAX on Polaris are encouraged to read the user documentation in detail, particularly the details about pure-functional programming, no in-place operations, and the common mistakes in writing functions for the @jit decorator.

+

JAX on Polaris

+

JAX is installed on Polaris via the conda module, available with: +

module load conda; conda activate
+

+

Then, you can load JAX in python as usual (below showing results from the conda/2022-07-19 module):

+
>>> import jax
+>>> jax.__version__
+'0.3.15'
+>>>
+
+

Notes on JAX 0.3.15

+

On Polaris, due to a bug, an environment variable must be set to use JAX on GPUs. The following code will crash: +

import jax.numpy as numpy
+a = numpy.zeros(1000)
+
+outputting an error that looks like: +
jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: no kernel image is available for execution on the device
+

+

You can fix this by setting an environment variable: +

export XLA_FLAGS="--xla_gpu_force_compilation_parallelism=1"
+

+

Scaling JAX to multiple GPUs and multiple Nodes

+

Jax has intrinsic scaling tools to use multiple GPUs on a single node, via the pmap function. If this is sufficient for your needs, excellent. If not, another alternative is to use the newer package mpi4jax.

+

mpi4Jax is a relatively new project and requires setting some environment variables for good performance and usability: +- Set MPI4JAX_USE_CUDA_MPI=1 to use CUDA-Aware MPI, supported in the conda module, to do operations directly from the GPU. +- Set MPICH_GPU_SUPPORT_ENABLED=1 to use CUDA-Aware MPI.

+

The following code, based off of a test script from the mpi4jax repository, can help you verify you are using mpi4jax properly:

+
import os
+from mpi4py import MPI
+import jax
+import jax.numpy as jnp
+import mpi4jax
+
+comm = MPI.COMM_WORLD
+rank = comm.Get_rank()
+local_rank = int(os.environ["PMI_LOCAL_RANK"])
+
+available_devices = jax.devices("gpu")
+if len(available_devices) <= local_rank:
+    raise Exception("Could not find enough GPUs")
+
+target_device = available_devices[local_rank]
+
+
+@jax.jit
+def foo(arr):
+   arr = arr + rank
+   arr_sum, _ = mpi4jax.allreduce(arr, op=MPI.SUM, comm=comm)
+   return arr_sum
+
+with jax.default_device(target_device):
+    a = jnp.zeros((3, 3))
+    print(f"Rank {rank}, local rank {local_rank}, a.device is {a.device()}")
+    result = foo(a)
+    print(f"Rank {rank}, local rank {local_rank}, result.device is {result.device()}")
+
+    import time
+    print("Sleeping for 5 seconds if you want to look at nvidia-smi ... ")
+    import time
+    time.sleep(5)
+    print("Done sleeping")
+
+if rank == 0:
+   print(result)
+
+

JAX and mpi4jax are both still somewhat early in their software lifecycles. Updates are frequent, and if you require assistance please contact support@alcf.anl.gov.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/data-science-workflows/frameworks/pytorch/index.html b/polaris/data-science-workflows/frameworks/pytorch/index.html new file mode 100644 index 0000000000..fe6f4ea533 --- /dev/null +++ b/polaris/data-science-workflows/frameworks/pytorch/index.html @@ -0,0 +1,6905 @@ + + + + + + + + + + + + + + + + + + + + + + + + + PyTorch - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

PyTorch on Polaris

+

PyTorch is a popular, open source deep learning framework developed and released by Facebook. The PyTorch home page has more information about PyTorch, which you can refer to. For trouble shooting on Polaris, please contact support@alcf.anl.gov.

+

Installation on Polaris

+

PyTorch is installed on Polaris already, available in the conda module. To use it from a compute node, please do:

+
module load conda
+conda activate
+
+

Then, you can load PyTorch in python as usual (below showing results from the conda/2022-07-19 module):

+
>>> import torch
+>>> torch.__version__
+'1.12.0a0+git67ece03'
+>>>
+
+

This installation of PyTorch was built from source and the cuda libraries it uses are found via the CUDA_HOME environment variable (below showing results from the conda/2022-07-19 module):

+
$ echo $CUDA_HOME
+/soft/datascience/cuda/cuda_11.5.2_495.29.05_linux
+
+

If you need to build applications that use this version of PyTorch and CUDA, we recommend using these cuda libraries to ensure compatibility. We periodically update the PyTorch release, though updates will come in the form of new versions of the conda module.

+

PyTorch is also available through nvidia containers that have been translated to Singularity containers. For more information about containers, please see the containers documentation page.

+

PyTorch Best Practices on Polaris

+

Single Node Performance

+

When running PyTorch applications, we have found the following practices to be generally, if not universally, useful and encourage you to try some of these techniques to boost performance of your own applications.

+
    +
  1. +

    Use Reduced Precision. Reduced Precision is available on A100 via tensorcores and is supported with PyTorch operations. In general, the way to do this is via the PyTorch Automatic Mixed Precision package (AMP), as descibed in the mixed precision documentation. In PyTorch, users generally need to manage casting and loss scaling manually, though context managers and function decorators can provide easy tools to do this.

    +
  2. +
  3. +

    PyTorch has a JIT module as well as backends to support op fusion, similar to TensorFlow's tf.function tools. However, PyTorch JIT capabilities are newer and may not yield performance improvements. Please see TorchScript for more information.

    +
  4. +
+

Multi-GPU / Multi-Node Scale up

+

PyTorch is compatible with scaling up to multiple GPUs per node, and across multiple nodes. Good scaling performance has been seen up to the entire Polaris system, > 2048 GPUs. Good performance with PyTorch has been seen with both DDP and Horovod. For details, please see the Horovod documentation or the Distributed Data Parallel documentation. Some Polaris-specific details that may be helpful to you:

+
    +
  1. CPU affinity and NCCL settings can improve scaling performance, particularly at the largest scales. In particular, we encourage users to try their scaling measurements with the following settings:
  2. +
  3. Set the environment variable NCCL_COLLNET_ENABLE=1
  4. +
  5. Set the environment varialbe NCCL_NET_GDR_LEVEL=PHB
  6. +
  7. +

    Manually set the CPU affinity via mpiexec, such as with --cpu-bind verbose,list:0,8,16,24

    +
  8. +
  9. +

    Horovod and DDP work best when you limit the visible devices to only one GPU. Note that if you import mpi4py or horovod, and then do something like os.environ["CUDA_VISIBLE_DEVICES"] = hvd.local_rank(), it may not actually work! You must set the CUDA_VISIBLE_DEVICES environment variable prior to doing MPI.COMM_WORLD.init(), which is done in horovod.init() as well as implicitly in from mpi4py import MPI. On Polaris specifically, you can use the environment variable PMI_LOCAL_RANK (as well as PMI_LOCAL_SIZE) to learn information about the node-local MPI ranks.

    +
  10. +
+

DeepSpeed

+

DeepSpeed is also available and usable on Polaris. For more information, please see the DeepSpeed documentation directly.

+

PyTorch DataLoader and multi-node Horovod

+

Please note there is a bug that causes a hang when using PyTorch's multithreaded data loaders with distributed training across multiple nodes. To workaround this, NVIDIA recommends setting num_workers=0 in the dataloader configuration, which serializes data loading.

+

For more details, see Polaris Known Issues.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/data-science-workflows/frameworks/tensorflow/index.html b/polaris/data-science-workflows/frameworks/tensorflow/index.html new file mode 100644 index 0000000000..a1a870771f --- /dev/null +++ b/polaris/data-science-workflows/frameworks/tensorflow/index.html @@ -0,0 +1,6857 @@ + + + + + + + + + + + + + + + + + + + + + + + + + TensorFlow - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

TensorFlow on Polaris

+

TensorFlow is a popular, open-source deep learning framework developed and released by Google. The TensorFlow home page has more information about TensorFlow, which you can refer to. For trouble shooting on Polaris, please contact support@alcf.anl.gov.

+

Installation on Polaris

+

TensorFlow is already pre-installed on Polaris, available in the conda module. To use it from a compute node, please do:

+
module load conda
+conda activate
+
+

Then, you can load TensorFlow in python as usual (below showing results from the conda/2022-07-19 module):

+
>>> import tensorflow as tf
+>>> tf.__version__
+'2.9.1'
+>>>
+
+

This installation of TensorFlow was built from source and the CUDA libraries it uses are found via the CUDA_HOME environment variable (below showing results from the conda/2022-07-19 module):

+
$ echo $CUDA_HOME
+/soft/datascience/cuda/cuda_11.5.2_495.29.05_linux
+
+

If you need to build applications that use this version of TensorFlow and CUDA, we recommend using these cuda libraries to ensure compatibility. We periodically update the TensorFlow release, though updates will come in the form of new versions of the conda module.

+

TensorFlow is also available through NVIDIA containers that have been translated to Singularity containers. For more information about containers, please see the Containers documentation page.

+

TensorFlow Best Practices on Polaris

+

Single Node Performance

+

When running TensorFlow applications, we have found the following practices to be generally, if not universally, useful and encourage you to try some of these techniques to boost performance of your own applications.

+
    +
  1. +

    Use Reduced Precision. Reduced Precision is available on A100 via tensorcores and is supported with TensorFlow operations. In general, the way to do this is via the tf.keras.mixed_precision Policy, as descibed in the mixed precision documentation. If you use a custom training loop (and not keras.Model.fit), you will also need to apply loss scaling.

    +
  2. +
  3. +

    Use TensorFlow's graph API to improve efficiency of operations. TensorFlow is, in general, an imperative language but with function decorators like @tf.function you can trace functions in your code. Tracing replaces your python function with a lower-level, semi-compiled TensorFlow Graph. More information about the tf.function interface is available here. When possible, use jit_compile, but be aware of sharp bits when using tf.function: python expressions that aren't tensors are often replaced as constants in the graph, which may or may not be your intention.

    +
  4. +
  5. +

    Use XLA compilation on your code. XLA is the Accelerated Linear Algebra library that is available in tensorFlow and critical in software like JAX. XLA will compile a tf.Graph object, generated with tf.function or similar, and perform optimizations like operation-fusion. XLA can give impressive performance boosts with almost no user changes except to set an environment variable TF_XLA_FLAGS=--tf_xla_auto_jit=2. If your code is complex, or has dynamically sized tensors (tensors where the shape changes every iteration), XLA can be detrimental: the overhead for compiling functions can be large enough to mitigate performance improvements. XLA is particularly powerful when combined with reduced precision, yielding speedups > 100% in some models.

    +
  6. +
+

Multi-GPU / Multi-Node Scale up

+

TensorFlow is compatible with scaling up to multiple GPUs per node, and across multiple nodes. Good scaling performance has been seen up to the entire Polaris system, > 2048 GPUs. Good performance with tensorFlow has been seen with horovod in particular. For details, please see the Horovod documentation. Some polaris specific details that may be helpful to you:

+
    +
  1. CPU affinity and NCCL settings can improve scaling performance, particularly at the largest scales. In particular, we encourage users to try their scaling measurements with the following settings:
  2. +
  3. Set the environment variable NCCL_COLLNET_ENABLE=1
  4. +
  5. Set the environment varialbe NCCL_NET_GDR_LEVEL=PHB
  6. +
  7. +

    Manually set the CPU affinity via mpiexec, such as with --cpu-bind verbose,list:0,8,16,24

    +
  8. +
  9. +

    Horovod works best when you limit the visible devices to only one GPU. Note that if you import mpi4py or horovod, and then do something like os.environ["CUDA_VISIBLE_DEVICES"] = hvd.local_rank(), it may not actually work! You must set the CUDA_VISIBLE_DEVICES environment variable prior to doing MPI.COMM_WORLD.init(), which is done in horovod.init() as well as implicitly in from mpi4py import MPI. On Polaris specifically, you can use the environment variable PMI_LOCAL_RANK (as well as PMI_LOCAL_SIZE) to learn information about the node-local MPI ranks.

    +
  10. +
+

TensorFlow Dataloaders

+

Additional information to be provided.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/data-science-workflows/julia/index.html b/polaris/data-science-workflows/julia/index.html new file mode 100644 index 0000000000..8392b573fd --- /dev/null +++ b/polaris/data-science-workflows/julia/index.html @@ -0,0 +1,7186 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Julia - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+ +
+ + + +
+
+ + + + + + + +

Julia

+

Julia is a high-level, high-performance dynamic programming language for +technical computing. It has a syntax familiar to users of many other +technical computing environments. Designed at MIT to tackle large-scale +partial-differential equation simulation and distributed linear algebra, Julia +features a robust ecosystem of tools for +optimization, +statistics, parallel programming, and data +visualization. Julia is actively developed by the Julia +Labs team at MIT and in +industry, along with hundreds of domain-expert +scientists and programmers worldwide.

+

Contributing

+

This guide is a first draft of the Julia documentation for Polaris. If you have any +suggestions or contributions, please open a pull request or contact us by +opening a ticket at the ALCF Helpdesk.

+

Julia Installation

+

Using the official Julia 1.9 binaries from the Julia +webpage is recommended. +Juliaup provides a convenient way to +install Julia and manage the various Julia versions.

+
curl -fsSL https://install.julialang.org | sh
+
+

You may then list the available Julia versions with juliaup list and install a +specific version with juliaup install <version>. You can then activate a +specific version with juliaup use <version> and set the default version with +juliaup default <version>. juliaup update will update the installed Julia +versions. In general, the latest stable release of Julia should be used.

+
juliaup add release
+
+

Julia Project Environment

+

The Julia built-in package manager allows you to create a project and enable +project-specific dependencies. Julia manages packages in the Julia depot located +by default in ~/.julia. However, that NFS filesystem is not meant for +high-speed access. Therefore, this Julia depot folder should be located on a +fast filesystem of your choice (grand, eagle). The Julia depot directory is +set via the environment variable JULIA_DEPOT_PATH. For example, you can set +the Julia depot to a directory on Polaris grand filesystem by adding the following line +to your ~/.bashrc file:

+
export /lus/grand/projects/$PROJECT/$USER/julia_depot
+
+

Programming Julia on Polaris

+

There are three key components to using Julia for large-scale computations:

+
    +
  1. MPI support through MPI.jl
  2. +
  3. GPU support through CUDA.jl
  4. +
  5. HDF5 support through HDF5.jl
  6. +
+

In addition, we recommend VSCode with the Julia +extension for a modern IDE experience, together +with the ssh-remote extension for remote interactive development.

+

MPI Support

+

MPI support is provided through the MPI.jl. +

julia> ] add MPI
+
+This will install the MPI.jl package and default MPI prebuilt binaries provided by an artifact. For on-node debugging purposes the default artifact is sufficient. However, for large-scale computations, it is to use the system MPI library that is loaded via module. As of MPI.jl v0.20 this is handled through MPIPrefences.jl. +
julia --project -e 'using MPIPreferences; MPIPreferences.use_system_binary()'
+

+

Check that the correct MPI library is targeted with Julia. +

julia --project -e 'using MPI; MPI.versioninfo()'
+MPIPreferences:
+  binary:  system
+  abi:     MPICH
+  libmpi:  libmpi_cray
+  mpiexec: mpiexec
+
+Package versions
+  MPI.jl:             0.20.11
+  MPIPreferences.jl:  0.1.8
+
+Library information:
+  libmpi:  libmpi_cray
+  MPI version:  3.1.0
+  Library version:
+    MPI VERSION    : CRAY MPICH version 8.1.16.5 (ANL base 3.4a2)
+    MPI BUILD INFO : Mon Apr 18 12:05 2022 (git hash 4f56723)
+
+When running on the login node, switch back to the default provided MPI binaries in MPI_jll.jl by removing the LocalPreferences.toml file.

+

GPU Support

+

NVIDIA GPU support is provided through the CUDA.jl package. +

julia> ] add CUDA
+
+In case you want write portable GPU kernels we highly recommend the KernelAbstractions.jl package. It provides a high-level abstraction for writing GPU kernels that can be compiled for different GPU backends. +
julia> ] add KernelAbstractions
+
+By loading either oneAPI.jl, AMDGPU.jl, or CUDA.jl (see quickstart guide below).

+

HDF5 Support

+

Parallel HDF5 support is provided by +

module load cray-hdf5-parallel
+
+After setting export JULIA_HDF5_PATH=$HDF5_DIR we can install the HDF5.jl package. +
julia> ] add HDF5
+

+

Quickstart Guide

+

The following example shows how to use MPI.jl, CUDA.jl, and HDF5.jl to write a +parallel program that computes the sum of two vectors on the GPU and writes the +result to an HDF5 file. A repository with an example code computing an +approximation of pi can be found at +Polaris.jl. In this repository, you will also find +a setup_polaris.sh script that will build the HDF5.jl and MPI.jl package for the system libraries. +The dependencies are installed with the following commands: +

julia --project
+

+
julia> ] up
+
+
using CUDA
+using HDF5
+using MPI
+using Printf
+using Random
+
+function pi_kernel(x, y, d, n)
+    idx = (blockIdx().x-1) * blockDim().x + threadIdx().x
+    if idx <= n
+        d[idx] = (x[idx] - 0.5)^2 + (y[idx] - 0.5)^2 <= 0.25 ? 1 : 0
+    end
+    return nothing
+end
+
+function approximate_pi_gpu(n::Integer)
+    x = CUDA.rand(Float64, n)
+    y = CUDA.rand(Float64, n)
+    d = CUDA.zeros(Float64, n)
+
+    nblocks = ceil(Int64, n/32)
+
+    @cuda threads=32 blocks=nblocks pi_kernel(x,y,d,n)
+
+    return sum(d)
+end
+
+function main()
+    n = 100000  # Number of points to generate per rank
+    Random.seed!(1234)  # Set a fixed random seed for reproducibility
+
+    dsum = MPI.Allreduce(approximate_pi_gpu(n), MPI.SUM, MPI.COMM_WORLD)
+
+    pi_approx = (4 * dsum) / (n * MPI.Comm_size(MPI.COMM_WORLD))
+
+    if MPI.Comm_rank(MPI.COMM_WORLD) == 0
+        @printf "Approximation of π using Monte Carlo method: %.10f\n" pi_approx
+        @printf "Error: %.10f\n" abs(pi_approx - π)
+    end
+    return pi_approx
+end
+
+MPI.Init()
+if !isinteractive()
+    pi_approx = main()
+    h5open("pi.h5", "w") do file
+        write(file, "pi", pi_approx)
+    end
+end
+
+

Job submission script

+

This example can be run on Polaris with the following job submission script: +

#!/bin/bash -l
+#PBS -l select=1:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -l filesystems=home:grand
+#PBS -q debug
+#PBS -A PROJECT
+
+cd ${PBS_O_WORKDIR}
+
+# MPI example w/ 4 MPI ranks per node spread evenly across cores
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS_PER_NODE=4
+NDEPTH=8
+NTHREADS=1
+module load cray-hdf5-parallel
+# Put in your Julia depot path
+export JULIA_DEPOT_PATH=MY_JULIA_DEPOT_PATH
+# Path to Julia executable. When using juliaup, it's in your julia_depot folder
+JULIA_PATH=$JULIA_DEPOT_PATH/juliaup/julia-1.9.1+0.x64.linux.gnu/bin/julia
+NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
+echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE=${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
+
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth julia --check-bounds=no --project pi.jl
+
+Verify that JULIA_DEPOT_PATH is set to the correct path and JULIA_PATH +points to the Julia executable. When using juliaup, the Julia executable is +located in the juliaup folder of your JULIA_DEPOT_PATH.

+

Advanced features

+

CUDA-aware MPI

+

MPI.jl supports CUDA-aware MPI. This is enabled by setting the following environment variables

+
export JULIA_CUDA_MEMORY_POOL=none
+export MPICH_GPU_SUPPORT_ENABLED=1
+export JULIA_MPI_PATH=$PATH_TO_CUDA_MPI # /opt/cray/pe/mpich/8.1.16/ofi/nvidia/20.7
+export JULIA_MPI_HAS_CUDA=1
+
+

Note that MPI.jl needs to be rebuilt for the changes to take effect.

+
julia --project -e 'using Pkg; Pkg.build("MPI"; verbose=true)'
+
+

Large-scale parallelism

+

CUDA.jl uses the nvcc compiler to compile GPU kernels. This will create object files in the TEMP filesystem. Per default, the tempdir is a global directory that can lead to name clashes of the compiled kernel object files. To avoid this, we recommend setting the tempdir to a local directory on the compute node. +

export TMPDIR=/local/scratch
+

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/data-science-workflows/python/index.html b/polaris/data-science-workflows/python/index.html new file mode 100644 index 0000000000..bc1a451624 --- /dev/null +++ b/polaris/data-science-workflows/python/index.html @@ -0,0 +1,6867 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Python - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Python

+

Conda

+

We provide prebuilt conda environments containing GPU-supported builds of torch, tensorflow (both with horovod support for multi-node calculations), jax, and many other commonly-used Python modules.

+

Users can activate this environment by first loading the conda module, and then activating the base environment.

+

Explicitly (either from an interactive job, or inside a job script):

+

$ module load conda
+$ conda activate base
+(base) $ which python3
+/soft/datascience/conda/2022-09-08/mconda3/bin/python3
+
+In one line, module load conda; conda activate. This can be performed on a compute node, as well as a login node.

+

As of writing, the latest conda module on Polaris is built on Miniconda3 version 4.14.0 and contains Python 3.8.13. Future modules may contain entirely different major versions of Python, PyTorch, TensorFlow, etc.; however, the existing modules will be maintained as-is as long as feasible.

+

While the shared Anaconda environment encapsulated in the module contains many of the most commonly used Python libraries for our users, you may still encounter a scenario in which you need to extend the functionality of the environment (i.e. install additional packages)

+

There are two different approaches that are currently recommended.

+

Virtual environments via venv

+

Creating your own (empty) virtual Python environment in a directory that is writable to you is simple: +

python3 -m venv /path/to/new/virtual/environment
+
+This creates a new folder that is fairly lightweight folder (<20 MB) with its own Python interpreter where you can install whatever packages you'd like. First, you must activate the virtual environment to make this Python interpreter the default interpreter in your shell session.

+

You activate the new environment whenever you want to start using it via running the activate script in that folder: +

/path/to/new/virtual/environment/bin/activate
+

+

In many cases, you do not want an empty virtual environment, but instead want to start from the conda base environment's installed packages, only adding and/or changing a few modules.

+

To extend the base Anaconda environment with venv (e.g. my_env in the current directory) and inherit the base enviroment packages, one can use the --system-site-packages flag:

+

module load conda; conda activate
+python -m venv --system-site-packages my_env
+source my_env/bin/activate
+# Install additional packages here...
+
+You can always retroactively change the --system-site-packages flag state for this virtual environment by editing my_env/pyvenv.cfg and changing the value of the line include-system-site-packages = false.

+

To install a different version of a package that is already installed in the base +environment, you can use: +

pip install --ignore-installed  ... # or -I
+
+The shared base environment is not writable, so it is impossible to remove or uninstall +packages from it. The packages installed with the above pip command should shadow those +installed in the base environment.

+

Cloning the base Anaconda environment

+

If you need more flexibility, you can clone the conda environment into a custom path, which would then allow for root-like installations via conda install <module> or pip install <module>. Unlike the venv approach, using a cloned Anaconda environment requires you to copy the entirety of the base environment, which can use significant storage space.

+

This can be performed by:

+

$ module load conda
+$ conda activate base
+(base) $ conda create --clone base --prefix /path/to/envs/base-clone
+(base) $ conda activate /path/to/envs/base-clone
+(base-clone) $ which python3
+/path/to/base-clone/bin/python3
+
+The cloning process can be quite slow.

+
+

Warning

+

In the above commands, path/to/envs/base-clone should be replaced by a +suitably chosen path.

+
+ +

With the conda environment setup, one can install common Python modules using pip install --users <module-name> which will install packages in $PYTHONUSERBASE/lib/pythonX.Y/site-packages. The $PYTHONUSERBASE environment variable is automatically set when you load the base conda module, and is equal to /home/$USER/.local/polaris/conda/YYYY-MM-DD.

+

Note, Python modules installed this way that contain command line binaries will not have those binaries automatically added to the shell's $PATH. To manually add the path: +

export PATH=$PYTHONUSERBASE/bin:$PATH
+
+Be sure to remove this location from $PATH if you deactivate the base Anaconda environment or unload the module.

+

Cloning the Anaconda environment, or using venv are both more flexible and transparent when compared to --user installs.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/debugging-tools/CUDA-GDB/index.html b/polaris/debugging-tools/CUDA-GDB/index.html new file mode 100644 index 0000000000..8ec7d776e1 --- /dev/null +++ b/polaris/debugging-tools/CUDA-GDB/index.html @@ -0,0 +1,6980 @@ + + + + + + + + + + + + + + + + + + + + + + + + + CUDA-GDB - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

CUDA-GDB

+

References

+

NVIDIA CUDA-GDB Documentation

+

Introduction

+

CUDA-GDB is the NVIDIA tool for debugging CUDA applications running on Polaris. CUDA-GDB is an extension to GDB, the GNU Project debugger. The tool provides developers with a mechanism for debugging CUDA applications running on actual hardware. This enables developers to debug applications without the potential variations introduced by simulation and emulation environments.

+

Step-by-step guide

+

Debug Compilation

+

NVCC, the NVIDIA CUDA compiler driver, provides a mechanism for generating the debugging information necessary for CUDA-GDB to work properly. The -g -G option pair must be passed to NVCC when an application is compiled for ease of debugging with CUDA-GDB; for example, +

nvcc -g -G foo.cu -o foo
+
+Using this line to compile the CUDA application foo.cu +* forces -O0 compilation, with the exception of very limited dead-code eliminations and register-spilling optimizations. +* makes the compiler include debug information in the executable

+

Running CUDA-gdb on Polaris compute nodes

+

Start an interactive job mode on Polaris as follows:
+

$ qsub -I -l select=1 -l walltime=1:00:00
+
+$ cuda-gdb --version
+NVIDIA (R) CUDA Debugger
+11.4 release
+Portions Copyright (C) 2007-2021 NVIDIA Corporation
+GNU gdb (GDB) 10.1
+Copyright (C) 2020 Free Software Foundation, Inc.
+License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+
+$ cuda-gdb foo
+

+

A quick example with a stream benchmark on a Polaris compute node

+
jkwack@polaris-login-02:~> qsub -I -l select=1 -l walltime=1:00:00
+qsub: waiting for job 308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov to start
+qsub: job 308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov ready
+
+
+Currently Loaded Modules:
+  1) craype-x86-rome          4) perftools-base/22.05.0   7) cray-dsmml/0.2.2   10) cray-pmi-lib/6.0.17  13) PrgEnv-nvhpc/8.3.3
+  2) libfabric/1.11.0.4.125   5) nvhpc/21.9               8) cray-mpich/8.1.16  11) cray-pals/1.1.7      14) craype-accel-nvidia80
+  3) craype-network-ofi       6) craype/2.7.15            9) cray-pmi/6.1.2     12) cray-libpals/1.1.7
+
+jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> nvcc -g -G -c ../src/cuda/CUDAStream.cu  -I ../src/
+
+jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> nvcc -g -G -c ../src/main.cpp -DCUDA -I ../src/cuda/ -I ../src/
+
+jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> nvcc -g -G main.o CUDAStream.o -o cuda-stream-debug
+
+jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> ./cuda-stream-debug 
+BabelStream
+Version: 4.0
+Implementation: CUDA
+Running kernels 100 times
+Precision: double
+Array size: 268.4 MB (=0.3 GB)
+Total size: 805.3 MB (=0.8 GB)
+Using CUDA device NVIDIA A100-SXM4-40GB
+Driver: 11040
+Function    MBytes/sec  Min (sec)   Max         Average     
+Copy        1313940.694 0.00041     0.00047     0.00047     
+Mul         1302000.791 0.00041     0.00048     0.00047     
+Add         1296217.720 0.00062     0.00070     0.00069     
+Triad       1296027.887 0.00062     0.00070     0.00069     
+Dot         823405.227  0.00065     0.00076     0.00075     
+
+jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> cuda-gdb ./cuda-stream-debug 
+NVIDIA (R) CUDA Debugger
+11.4 release
+Portions Copyright (C) 2007-2021 NVIDIA Corporation
+GNU gdb (GDB) 10.1
+Copyright (C) 2020 Free Software Foundation, Inc.
+License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+Type "show copying" and "show warranty" for details.
+This GDB was configured as "x86_64-pc-linux-gnu".
+Type "show configuration" for configuration details.
+For bug reporting instructions, please see:
+<https://www.gnu.org/software/gdb/bugs/>.
+Find the GDB manual and other documentation resources online at:
+    <http://www.gnu.org/software/gdb/documentation/>.
+
+For help, type "help".
+Type "apropos word" to search for commands related to "word"...
+Reading symbols from ./cuda-stream-debug...
+(cuda-gdb) b CUDAStream.cu:203
+Breakpoint 1 at 0x412598: CUDAStream.cu:203. (2 locations)
+(cuda-gdb) r      
+Starting program: /home/jkwack/BabelStream/build_polaris_debug/cuda-stream-debug 
+[Thread debugging using libthread_db enabled]
+Using host libthread_db library "/lib64/libthread_db.so.1".
+BabelStream
+Version: 4.0
+Implementation: CUDA
+Running kernels 100 times
+Precision: double
+Array size: 268.4 MB (=0.3 GB)
+Total size: 805.3 MB (=0.8 GB)
+[Detaching after fork from child process 58459]
+[New Thread 0x15554c6bb000 (LWP 58475)]
+Using CUDA device NVIDIA A100-SXM4-40GB
+Driver: 11040
+[New Thread 0x15554c4ba000 (LWP 58476)]
+[Switching focus to CUDA kernel 0, grid 5, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 3, lane 0]
+
+Thread 1 "cuda-stream-deb" hit Breakpoint 1, triad_kernel<double><<<(32768,1,1),(1024,1,1)>>> (a=0x155506000000, b=0x1554f6000000, c=0x1554e6000000)
+    at ../src/cuda/CUDAStream.cu:203
+203   a[i] = b[i] + scalar * c[i];
+(cuda-gdb) c
+Continuing.
+[Switching focus to CUDA kernel 0, grid 5, block (1,0,0), thread (0,0,0), device 0, sm 0, warp 32, lane 0]
+
+Thread 1 "cuda-stream-deb" hit Breakpoint 1, triad_kernel<double><<<(32768,1,1),(1024,1,1)>>> (a=0x155506000000, b=0x1554f6000000, c=0x1554e6000000)
+    at ../src/cuda/CUDAStream.cu:203
+203   a[i] = b[i] + scalar * c[i];
+(cuda-gdb) info locals
+i = 1024
+(cuda-gdb) p b[i]
+$1 = 0.040000000000000008
+(cuda-gdb) p scalar
+$2 = 0.40000000000000002
+(cuda-gdb) p c[i]
+$3 = 0.14000000000000001
+(cuda-gdb) d 1
+(cuda-gdb) c
+Continuing.
+Function    MBytes/sec  Min (sec)   Max         Average     
+Copy        1314941.553 0.00041     0.00041     0.00041     
+Mul         1301022.680 0.00041     0.00042     0.00041     
+Add         1293858.147 0.00062     0.00063     0.00063     
+Triad       1297681.929 0.00062     0.00063     0.00062     
+Dot         828446.963  0.00065     0.00066     0.00065     
+[Thread 0x15554c4ba000 (LWP 58476) exited]
+[Thread 0x15554c6bb000 (LWP 58475) exited]
+[Inferior 1 (process 58454) exited normally]
+(cuda-gdb) q
+
+jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> 
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/getting-started/index.html b/polaris/getting-started/index.html new file mode 100644 index 0000000000..9390bb54f7 --- /dev/null +++ b/polaris/getting-started/index.html @@ -0,0 +1,6876 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Getting Started - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Getting Started on Polaris

+

Logging Into Polaris

+

To log into Polaris: +

ssh <username>@polaris.alcf.anl.gov
+
+Then, type in the password from your CRYPTOCard/MobilePASS+ token.

+

Hardware Overview

+

An overview of the Polaris system including details on the compute node architecture is available on the Machine Overview page.

+

Compiling Applications

+

Users are encouraged to read through the Compiling and Linking Overview page and corresponding pages depending on the target compiler and programming model.

+

Submitting and Running Jobs

+

Users are encouraged to read through the Running Jobs with PBS at the ALCF page for information on using the PBS scheduler and preparing job submission scripts. Some example job submission scripts are available on the Example Job Scripts page as well.

+

Lustre File Striping

+

In addition to the content above, here is a document on Lustre File Striping Basics.

+ +

Proxy

+

If the node you are on doesn’t have outbound network connectivity, add the following to your ~/.bash_profile file to access the proxy host.

+
# proxy settings
+export HTTP_PROXY="http://proxy-01.pub.alcf.anl.gov:3128"
+export HTTPS_PROXY="http://proxy-01.pub.alcf.anl.gov:3128"
+export http_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
+export https_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
+export ftp_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
+export no_proxy="admin,polaris-adminvm-01,localhost,*.cm.polaris.alcf.anl.gov,polaris-*,*.polaris.alcf.anl.gov,*.alcf.anl.gov"
+
+

Getting Assistance

+

Please direct all questions, requests, and feedback to support@alcf.anl.gov.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/hardware-overview/files/Aries1.gif b/polaris/hardware-overview/files/Aries1.gif new file mode 100644 index 0000000000..a9952496fa Binary files /dev/null and b/polaris/hardware-overview/files/Aries1.gif differ diff --git a/polaris/hardware-overview/files/Aries2.gif b/polaris/hardware-overview/files/Aries2.gif new file mode 100644 index 0000000000..9636b497f5 Binary files /dev/null and b/polaris/hardware-overview/files/Aries2.gif differ diff --git a/polaris/hardware-overview/machine-overview/index.html b/polaris/hardware-overview/machine-overview/index.html new file mode 100644 index 0000000000..d2668a0e9e --- /dev/null +++ b/polaris/hardware-overview/machine-overview/index.html @@ -0,0 +1,7117 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Polaris Machine Overview - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Polaris

+

Polaris is a 560 node HPE Apollo 6500 Gen 10+ based system. Each node has a single 2.8 GHz AMD EPYC Milan 7543P 32 core CPU with 512 GB of DDR4 RAM and four NVIDIA A100 GPUs connected via NVLink, a pair of local 1.6TB of SSDs in RAID0 for the users use, and a pair of Slingshot network adapters. They are currently Slingshot 10, but are scheduled to be upgraded to Slingshot 11 in 2023. There are two nodes per chassis, seven chassis per rack, and 40 racks for a total of 560 nodes. More detailed specifications are as follows:

+

Polaris Compute Nodes

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
POLARIS COMPUTEDESCRIPTIONPER NODEAGGREGATE
Processor (Note 1)2.8 GHz 7543P1560
Cores/ThreadsAMD Zen 3 (Milan)32/6417,920/35,840
RAM (Note 2)DDR4512 GiB280 TiB
GPUSNVIDIA A10042240
Local SSD1.6 TB2/3.2 TB1120/1.8PB
+

Note 1: 256MB shared L3 cache, 512KB L2 cache per core, 32 KB L1 cache per core +Note 2: 8 memory channels rated at 204.8 GiB/s

+

Polaris A100 GPU Information

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DESCRIPTIONA100 PCIeA100 HGX (Polaris)
GPU Memory40 GiB HBM2160 GiB HBM2
GPU Memory BW1.6 TB/s6.4 TB/s
InterconnectPCIe Gen4 64 GB/sNVLink 600 GB/s
FP 649.7 TF38.8 TF
FP64 Tensor Core19.5 TF78 TF
FP 3219.5 TF78 TF
BF16 Tensor Core312 TF1.3 PF
FP16 Tensor Core312 TF1.3 PF
INT8 Tensor Core624 TOPS2496 TOPS
Max TDP Power250 W400 W
+

Polaris Device Affinity Information

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CPU AffinityNUMA AffinityGPU0GPU1GPU2GPU3mlx5_0mlx5_1
24-31,56-633GPU0XNV4NV4NV4SYSSYS
16-23,48-552GPU1NV4XNV4NV4SYSPHB
8-15,40-471GPU2NV4NV4XNV4SYSSYS
0-7,32-390GPU3NV4NV4NV4XPHBSYS
mlx5_0SYSSYSSYSPHBXSYS
mlx5_1SYSPHBSYSSYSSYSX
+

Legend:

+

X = Self +SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) +NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node +PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) +PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) +PIX = Connection traversing at most a single PCIe bridge +NV# = Connection traversing a bonded set of # NVLinks

+

Links to detailed NVIDIA A100 documentation: + - NVIDIA A100 Tensor Core GPU Architecture + - NVIDIA Ampere Architecture In-Depth

+

Login nodes

+

There are four login nodes available to users for editing code, building code, submitting / monitoring jobs, checking usage (sbank), etc.. Their full hostnames are polaris-login-N.hsn.cm.polaris.alcf.anl.gov for N equal to 01 through 04; there are an additional two login nodes that are not user-accessible which are used for running services such as JupyterHub. The various compilers and libraries are present on the logins, so most users should be able to build their code. However, if your build requires the physical presence of the GPU, you will need to build on a compute node.

+

All users share the same login nodes so please be courteous and respectful of your fellow users. For example, please do not run computationally or IO intensive pre- or post-processing on the logins and keep the parallelism of your builds to a reasonable level.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
POLARIS LOGINDESCRIPTIONPER NODEAGGREGATE
Processor (Note 1)2.0 GHz 7713212
Cores/ThreadsAMD Zen 3 (Milan)128/256768/1536
RAM (Note 2)DDR4512 GiB3 TiB
GPUs (Note 3)No GPUs00
Local SSDNone00
+

Note 1: 256MB shared L3 cache, 512KB L2 cache per core, 32 KB L1 cache per core +Note 2: 8 memory channels rated at 204.8 GiB/s per socket +Note 3: If your build requires the physical presence of a GPU you will need to build on a compute node.

+

Gateway nodes

+

There are 50 gateway nodes. These nodes are not user accessible, but are used transparently for access to the storage systems. Each node has a single 200 Gbps HDR IB card for access to the storage area network. This gives a theoretical peak bandwidth of 1250 GB/s which is approximately the aggregate bandwidth of the global file systems (1300 GB/s).

+

Storage

+

Polaris has access to the ALCF global file systems. Details on storage can be found here.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/known-issues/index.html b/polaris/known-issues/index.html new file mode 100644 index 0000000000..c3adf0e988 --- /dev/null +++ b/polaris/known-issues/index.html @@ -0,0 +1,6834 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Known Issues - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Known Issues

+

This is a collection of known issues that have been encountered on Polaris. Documentation will be updated as issues are resolved. Users are encouraged to email support@alcf.anl.gov to report issues.

+

Compiling & Running Applications

+
    +
  1. Since the Slingshot 11 and related software upgrade, users may encounter the following issue when running an application.
  2. +
+
/opt/cray/pe/gcc-libs/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by a.out)
+
+

At this time, it is suggested to update the LD_PRELOAD environment variable as follows.

+
export LD_PRELOAD=/opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6
+
+
    +
  1. If your job fails to start with an RPC launch message like below, please forward the complete messages to support@alcf.anl.gov.
  2. +
+
launch failed on x3104c0s1b0n0: Couldn't forward RPC launch(ab751d77-e80a-4c54-b1c2-4e881f7e8c90) to child x3104c0s31b0n0.hsn.cm.polaris.alcf.anl.gov: Resource temporarily unavailable
+
+
    +
  1. +

    With PrgEnv-nvhpc/8.3.3, if you are using nvcc to indirectly invoke nvc++ and compiling C++17 code (as, for example, in building Kokkos via nvcc_wrapper), you will get compilation errors with C++17 constructs. See our documentation on NVIDIA Compilers for a workaround.

    +
  2. +
  3. +

    PrgEnv-nvhpc/8.3.3 currently loads the nvhpc/21.9 module, which erroneously has the following lines:

    +
  4. +
+
setenv("CC","/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvc")
+setenv("CXX","/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvc++")
+setenv("FC","/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvfortran")
+setenv("F90","/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvfortran")
+setenv("F77","/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvfortran")
+setenv("CC","cpp")
+
+

In particular, the final line can cause issues for C-based projects (e.g. CMake may complain because the cpp C preprocessor is not a compiler). We recommend running the following in such cases:

+
unset CC
+unset F77
+unset CXX
+unset FC
+unset F90
+
+
    +
  1. +

    Cray MPICH may exhibit issues when MPI ranks call fork() and are distributed across multiple nodes. The process may hang or throw a segmentation fault.

    +

    In particular, this can manifest in hangs with PyTorch+Horovod with a DataLoader with multithreaded workers and distributed data parallel training on multiple nodes. We have built a module conda/2022-09-08-hvd-nccl which includes a Horovod built without support for MPI. It uses NCCL for GPU-GPU communication and Gloo for coordination across nodes.

    +

    export IBV_FORK_SAFE=1 may be a workaround for some manifestations of this bug; however it will incur memory registration overheads. It does not fix the hanging experienced with multithreaded dataloading in PyTorch+Horovod across multiple nodes with conda/2022-09-08, however (instead prompting a segfault).

    +

    This incompatibility also may affect Parsl; see details in the Special notes for Polaris section of the Parsl page.

    +
  2. +
+

Profiling Applications

+
    +
  1. The nsys profiler packaged with nvhpc/21.9 in some cases appears to be presenting broken timelines with start times not lined up. The issue does not appear to be present when nsys from cudatoolkit-standalone/11.2.2 is used. We expect this to no longer be an issue once nvhpc/22.5 is made available as the default version.
  2. +
+

Submitting Jobs

+
    +
  1. +

    For batch job submissions, if the parameters within your submission script do not meet the parameters of any of the execution queues (small, ..., backfill-large) you might not receive the "Job submission" error on the command line at all, and the job will never appear in history qstat -xu <username> (current bug in PBS). E.g. if a user submits a script to the prod routing queue requesting 10 nodes for 24 hours, exceeding "Time Max" of 6 hrs of the small execution queue (which handles jobs with 10-24 nodes), then it may behave as if the job was never submitted.

    +
  2. +
  3. +

    Job scripts are copied to temporary locations after qsub and any changes to the original script while the job is queued will not be reflected in the copied script. Furthermore, qalter requires -A <allocation name> when changing job properties. Currently, there is a request for a qalter-like command to trigger a re-copy of the original script to the temporary location.

    +
  4. +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/performance-tools/Images/JK_Ncu_Plots_Details.png b/polaris/performance-tools/Images/JK_Ncu_Plots_Details.png new file mode 100644 index 0000000000..141c8986e7 Binary files /dev/null and b/polaris/performance-tools/Images/JK_Ncu_Plots_Details.png differ diff --git a/polaris/performance-tools/Images/JK_Ncu_Plots_Roofline.png b/polaris/performance-tools/Images/JK_Ncu_Plots_Roofline.png new file mode 100644 index 0000000000..88fd2f60da Binary files /dev/null and b/polaris/performance-tools/Images/JK_Ncu_Plots_Roofline.png differ diff --git a/polaris/performance-tools/Images/JK_Ncu_Plots_SOL.png b/polaris/performance-tools/Images/JK_Ncu_Plots_SOL.png new file mode 100644 index 0000000000..c046b724c1 Binary files /dev/null and b/polaris/performance-tools/Images/JK_Ncu_Plots_SOL.png differ diff --git a/polaris/performance-tools/Images/JK_Ncu_Plots_sources.png b/polaris/performance-tools/Images/JK_Ncu_Plots_sources.png new file mode 100644 index 0000000000..bfc87e2d5f Binary files /dev/null and b/polaris/performance-tools/Images/JK_Ncu_Plots_sources.png differ diff --git a/polaris/performance-tools/Images/JK_Nsys_plot01.png b/polaris/performance-tools/Images/JK_Nsys_plot01.png new file mode 100644 index 0000000000..040fe7ec6d Binary files /dev/null and b/polaris/performance-tools/Images/JK_Nsys_plot01.png differ diff --git a/polaris/performance-tools/NVIDIA-Nsight/index.html b/polaris/performance-tools/NVIDIA-Nsight/index.html new file mode 100644 index 0000000000..04115f1533 --- /dev/null +++ b/polaris/performance-tools/NVIDIA-Nsight/index.html @@ -0,0 +1,7378 @@ + + + + + + + + + + + + + + + + + + + + + + + + + NVIDIA-Nsight - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

NVIDIA Nsight tools

+

References

+

NVIDIA Nsight Systems Documentation
+NVIDIA Nsight Compute Documentation

+

Introduction

+

NVIDIA® Nsight™ Systems provides developers a system-wide visualization of an applications performance. Developers can optimize bottlenecks to scale efficiently across any number or size of CPUs and GPUs on Polaris. For further optimizations to compute kernels developers should use Nsight Compute.

+

The NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool.

+

In addition, the baseline feature of this tool allows users to compare results within the tool. NVIDIA Nsight Compute provides a customizable and data-driven user interface, metric collection, and can be extended with analysis scripts for post-processing results.

+

Step-by-step guide

+

Common part on Polaris

+

Build your application for Polaris, and then submit your job script to Polaris or start an interactive job mode on Polaris as follows:
+

$ qsub -I -l select=1 -l walltime=1:00:00 -l filesystems=home:grand -q debug -A <project-name>
+
+$ module load cudatoolkit-standalone/11.8.0 
+$ module li
+
+Currently Loaded Modules:
+  1) craype-x86-rome          6) craype/2.7.15        11) cray-pals/1.1.7
+  2) libfabric/1.11.0.4.125   7) cray-dsmml/0.2.2     12) cray-libpals/1.1.7
+  3) craype-network-ofi       8) cray-mpich/8.1.16    13) PrgEnv-nvhpc/8.3.3
+  4) perftools-base/22.05.0   9) cray-pmi/6.1.2       14) craype-accel-nvidia80
+  5) nvhpc/21.9              10) cray-pmi-lib/6.0.17  15) cudatoolkit-standalone/11.8.0
+
+$ nsys --version
+NVIDIA Nsight Systems version 2022.4.2.1-df9881f
+
+$ ncu --version
+NVIDIA (R) Nsight Compute Command Line Profiler
+Copyright (c) 2018-2022 NVIDIA Corporation
+Version 2022.3.0.0 (build 31729285) (public-release)
+

+

Nsight Systems

+

Run your application with Nsight Systems as follows:
+

$ nsys profile -o {output_filename} --stats=true ./{your_application}
+

+

Nsight Compute

+

Run your application with Nsight Compute.
+

$ ncu --set detailed -k {kernel_name} -o {output_filename} ./{your_application}
+

+

Remark: Without -o option, Nsight Compute provides performance data as a standard output

+

Post-processing the profiled data

+

Post-processing via CLI

+
$ nsys stats {output_filename}.qdrep
+$ ncu -i {output_filename}.ncu-rep  
+
+

Post-processing on your local system via GUI

+
    +
  • Install NVIDIA Nsight Systems and NVIDIA Nsight Compute after downloading both of them from the NVIDIA Developer Zone.
    +Remark: Local client version should be the same as or newer than NVIDIA Nsight tools on Polaris.
  • +
  • Download nsys output files (i.e., ending with .qdrep and . sqlite) to your local system, and then open them with NVIDIA Nsight Systems on your local system.
  • +
  • Download ncu output files (i.e., ending with .ncu-rep) to your local system, and then open them with NVIDIA Nsight Compute on your local system.
  • +
+

More options for performance analysis with Nsight Systems and Nsight Compute

+
$ nsys --help
+$ ncu --help
+
+

A quick example

+

Nsight Systems

+

Running a stream benchmark with Nsight Systems

+
jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris> nsys profile -o JKreport-nsys-BableStream --stats=true ./cuda-stream
+Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
+Collecting data...
+BabelStream
+Version: 4.0
+Implementation: CUDA
+Running kernels 100 times
+Precision: double
+Array size: 268.4 MB (=0.3 GB)
+Total size: 805.3 MB (=0.8 GB)
+Using CUDA device NVIDIA A100-SXM4-40GB
+Driver: 11040
+Function    MBytes/sec  Min (sec)   Max         Average     
+Copy        1368294.603 0.00039     0.00044     0.00039     
+Mul         1334324.779 0.00040     0.00051     0.00041     
+Add         1358476.737 0.00059     0.00060     0.00059     
+Triad       1366095.332 0.00059     0.00059     0.00059     
+Dot         1190200.569 0.00045     0.00047     0.00046     
+Processing events...
+Saving temporary "/var/tmp/pbs.308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov/nsys-report-f594-c524-6b4c-300a.qdstrm" file to disk...
+
+Creating final output files...
+Processing [===============================================================100%]
+Saved report file to "/var/tmp/pbs.308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov/nsys-report-f594-c524-6b4c-300a.qdrep"
+Exporting 7675 events: [===================================================100%]
+
+Exported successfully to
+/var/tmp/pbs.308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov/nsys-report-f594-c524-6b4c-300a.sqlite
+
+
+CUDA API Statistics:
+
+ Time(%)  Total Time (ns)  Num Calls  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)           Name         
+ -------  ---------------  ---------  ------------  ------------  ------------  ------------  ---------------------
+    41.5      197,225,738        401     491,834.8       386,695       592,751      96,647.5  cudaDeviceSynchronize
+    35.4      168,294,004          4  42,073,501.0       144,211   167,547,885  83,649,622.0  cudaMalloc           
+    22.5      106,822,589        103   1,037,112.5       446,617    20,588,840   3,380,727.4  cudaMemcpy           
+     0.4        1,823,597        501       3,639.9         3,166        24,125       1,228.9  cudaLaunchKernel     
+     0.2        1,166,186          4     291,546.5       130,595       431,599     123,479.8  cudaFree             
+
+
+
+CUDA Kernel Statistics:
+
+ Time(%)  Total Time (ns)  Instances  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)                             Name                           
+ -------  ---------------  ---------  ------------  ------------  ------------  -----------  ----------------------------------------------------------
+    24.5       58,415,138        100     584,151.4       582,522       585,817        543.0  void add_kernel<double>(const T1 *, const T1 *, T1 *)     
+    24.4       58,080,329        100     580,803.3       579,802       582,586        520.5  void triad_kernel<double>(T1 *, const T1 *, const T1 *)   
+    18.3       43,602,345        100     436,023.5       430,555       445,979      2,619.5  void dot_kernel<double>(const T1 *, const T1 *, T1 *, int)
+    16.5       39,402,677        100     394,026.8       392,444       395,708        611.5  void mul_kernel<double>(T1 *, const T1 *)                 
+    16.1       38,393,119        100     383,931.2       382,556       396,892      1,434.1  void copy_kernel<double>(const T1 *, T1 *)                
+     0.2          523,355          1     523,355.0       523,355       523,355          0.0  void init_kernel<double>(T1 *, T1 *, T1 *, T1, T1, T1)    
+
+
+
+CUDA Memory Operation Statistics (by time):
+
+ Time(%)  Total Time (ns)  Count  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)      Operation     
+ -------  ---------------  -----  ------------  ------------  ------------  -----------  ------------------
+   100.0       61,323,171    103     595,370.6         2,399    20,470,146  3,439,982.0  [CUDA memcpy DtoH]
+
+
+
+CUDA Memory Operation Statistics (by size):
+
+ Total (MB)  Count  Average (MB)  Minimum (MB)  Maximum (MB)  StdDev (MB)      Operation     
+ ----------  -----  ------------  ------------  ------------  -----------  ------------------
+    805.511    103         7.820         0.002       268.435       45.361  [CUDA memcpy DtoH]
+
+
+
+Operating System Runtime API Statistics:
+
+ Time(%)  Total Time (ns)  Num Calls  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)        Name     
+ -------  ---------------  ---------  ------------  ------------  ------------  ------------  --------------
+    85.9      600,896,697         20  30,044,834.9         3,477   100,141,768  42,475,064.1  poll          
+    13.5       94,610,402      1,201      78,776.4         1,002    11,348,375     402,562.6  ioctl         
+     0.2        1,374,312         79      17,396.4         3,486       434,715      48,015.2  mmap64        
+     0.1          877,705         51      17,209.9         1,031       748,723     104,491.6  fopen         
+     0.1          741,969         12      61,830.8        17,272       256,852      64,706.5  sem_timedwait 
+     0.1          529,563        120       4,413.0         1,292        20,579       2,134.3  open64        
+     0.0          251,602          4      62,900.5        57,337        72,126       6,412.6  pthread_create
+     0.0           93,461         18       5,192.3         1,011        19,386       4,401.0  mmap          
+     0.0           37,621         11       3,420.1         1,302        11,672       2,867.6  munmap        
+     0.0           35,735          9       3,970.6         1,723         6,251       1,477.2  fgetc         
+     0.0           33,533          1      33,533.0        33,533        33,533           0.0  fgets         
+     0.0           26,832         13       2,064.0         1,452         3,366         542.6  write         
+     0.0           21,341          5       4,268.2         1,213         9,738       3,378.3  putc          
+     0.0           20,838          6       3,473.0         1,763         6,853       1,801.1  open          
+     0.0           17,016         10       1,701.6         1,523         1,834          96.9  read          
+     0.0           11,430          8       1,428.8         1,082         1,583         151.9  fclose        
+     0.0            6,202          1       6,202.0         6,202         6,202           0.0  pipe2         
+     0.0            5,961          2       2,980.5         2,254         3,707       1,027.4  socket        
+     0.0            5,670          2       2,835.0         2,795         2,875          56.6  fwrite        
+     0.0            5,481          1       5,481.0         5,481         5,481           0.0  connect       
+     0.0            5,279          2       2,639.5         1,743         3,536       1,267.8  fread         
+     0.0            1,082          1       1,082.0         1,082         1,082           0.0  bind          
+
+Report file moved to "/home/jkwack/BabelStream/build_polaris/JKreport-nsys-BableStream.qdrep"
+Report file moved to "/home/jkwack/BabelStream/build_polaris/JKreport-nsys-BableStream.sqlite"
+
+

Reviewing the Nsight Systems data via GUI

+

Nsys_screenshot

+

Nsight Compute

+

Running a stream benchmark with Nsight Compute for triad_kernel

+
jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris> ncu --set detailed -k triad_kernel -o JKreport-ncu_detailed-triad_kernel-BableStream ./cuda-stream
+BabelStream
+Version: 4.0
+Implementation: CUDA
+Running kernels 100 times
+Precision: double
+Array size: 268.4 MB (=0.3 GB)
+Total size: 805.3 MB (=0.8 GB)
+==PROF== Connected to process 56600 (/home/jkwack/BabelStream/build_polaris/cuda-stream)
+Using CUDA device NVIDIA A100-SXM4-40GB
+Driver: 11040
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+==PROF== Profiling "triad_kernel": 0%....50%....100% - 18 passes
+Function    MBytes/sec  Min (sec)   Max         Average     
+Copy        1331076.105 0.00040     0.00042     0.00041     
+Mul         1304696.608 0.00041     0.00043     0.00042     
+Add         1322600.587 0.00061     0.00062     0.00061     
+Triad       1327.700    0.60654     0.62352     0.61106     
+Dot         850376.762  0.00063     0.00070     0.00065     
+==PROF== Disconnected from process 56600
+==PROF== Report: /home/jkwack/BabelStream/build_polaris/JKreport-ncu_detailed-triad_kernel-BableStream.ncu-rep
+
+

Reviewing the Nsight Compute data via GUI

+

Ncu_Details +Ncu_SOL +Ncu_Roofline +Ncu_sources

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/programming-models/kokkos-polaris/index.html b/polaris/programming-models/kokkos-polaris/index.html new file mode 100644 index 0000000000..fd6bd67320 --- /dev/null +++ b/polaris/programming-models/kokkos-polaris/index.html @@ -0,0 +1,7035 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Kokkos - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Kokkos

+

Kokkos

+

Kokkos Core implements a programming model in C++ for writing performance +portable applications targeting all major HPC platforms. For that purpose it +provides abstractions for both parallel execution of code and data +management. Kokkos is designed to target complex node architectures with +N-level memory hierarchies and multiple types of execution resources. It +currently can use Serial and OpenMP (threads) for CPU execution spaces +("backends") and CUDA, HIP, SYCL, and OpenMPTarget for GPU execution +spaces. By convention, Kokkos only allows one GPU backend at a time.

+

Kokkos Documentation

+ +

Kokkos on Polaris

+

The prebuilt Kokkos on polaris includes 3 backends: Serial and OpenMP for CPU +execution and CUDA for GPU execution. To use it, run

+
module use /soft/modulefiles
+module swap PrgEnv-nvhpc PrgEnv-gnu
+module swap gcc/12.2.0 gcc/11.2.0
+module load cudatoolkit-standalone/11.8.0
+module load kokkos
+
+

(Since the SlingShot 11 upgrade, you must use PrgEnv-gnu and the gcc and +cudatoolkit version changes indicated, at least until some subsequent Polaris +sytem updates have been completed.)

+

This sets the following environment variables, some of which are used by +cmake:

+
    +
  • KOKKOS_HOME - path to the lib64/, include/ files installed
  • +
  • LIBRARY_PATH - prepends $KOKKOS_HOME/lib64 to this variable used by cmake
  • +
  • CPATH - prepends $KOKKOS_HOME/include to this variable used by cmake
  • +
  • LD_LIBRARY_PATH - prepends $KOKKOS_HOME/lib64 to this variable
  • +
+

Building a Kokkos Application Using cmake

+

Add these lines to CMakeLists.txt:

+
find_package(Kokkos REQUIRED)
+target_link_libraries(myTarget Kokkos::kokkoscore)
+
+

Here is a simple example CMakeLists.txt to compile an example program:

+
cmake_minimum_required(VERSION 3.22)
+project(buildExample)
+find_package(Kokkos REQUIRED)
+
+set(buildExample_SOURCE_DIR ".")
+
+set(top_SRCS
+  ${buildExample_SOURCE_DIR}/example1.cpp)
+
+set(SOURCE_FILES ${top_SRCS})
+
+add_executable(example1_sycl_aot ${SOURCE_FILES})
+target_link_libraries(example1_sycl_aot Kokkos::kokkoscore)
+target_include_directories(example1_sycl_aot PUBLIC ${buildExample_SOURCE_DIR})
+
+

Configure and build it like this:

+
mkdir build
+cd build
+cmake -DCMAKE_CXX_COMPILER=CC -DCMAKE_C_COMPILER=cc ..
+make
+
+

Building a Kokkos Application Using make

+

Here's an example Makefile:

+
# KOKKOS_HOME set via:
+#   module load kokkos
+
+# You can look at the first lines of $KOKKOS_HOME/KokkosConfigCommon.cmake to
+# see the flags used in cmake configuration of the kokkos library build. The
+# default Kokkos module on Polaris was built with PrgEnv-nvhpc and includes
+# Serial, OpenMP (threads) and CUDA backends. So you should have that
+# environment module loaded and include compiler flags for cuda and openmp:
+
+# Cray MPI wrapper for C++ and C compilers:
+CXX=CC
+CC=cc
+
+CPPFLAGS=-cuda -fopenmp
+LDFLAGS=
+
+LDFLAGS=$(CPPFLAGS) $(LDFLAGS)
+LDLIBS=-L$(KOKKOS_HOME)/lib64 -lkokkoscore -lkokkossimd -lpthread
+
+SRCS=example1.cpp
+OBJS=$(subst .cpp,.o,$(SRCS))
+
+all: example1_polaris
+
+example1_polaris: $(OBJS)
+        $(CXX) $(LDFLAGS) -o example1_polaris $(OBJS) $(LDLIBS)
+
+example1.o: example1.cpp
+
+clean:
+        rm -f $(OBJS)
+
+distclean: clean
+        rm -f example1_polaris
+
+

Configuring Your Own Kokkos Build on Polaris

+

Here are recommended environment settings and configuration to build your own +kokkos libraries on Polaris:

+

Environment

+

To match what was done in the centrally-built kokkos associated with the +modules discussed above, use the programming environment +PrgEnv-gnu, and use the Cray wrapper CC as the C++ compiler. You'll also +need to back up from the default gcc compiler version and make a few other +module adjustments to work correctly on Polaris following the SlingShot 11 +upgrade (and prior to some planned system upgrades that will make some of this +environment tweaking unnecessary):

+
module load cmake
+module swap PrgEnv-nvhpc PrgEnv-gnu
+module swap gcc/12.2.0 gcc/11.2.0
+module load cudatoolkit-standalone/11.8.0
+
+

CMake Configuration

+

This example builds three backends: OpenMP, Serial, and Cuda.

+
git clone git@github.com:kokkos/kokkos.git
+cd kokkos
+mkdir build
+cd build
+
+cmake\
+ -DCMAKE_BUILD_TYPE=RelWithDebInfo\
+ -DCMAKE_INSTALL_PREFIX="./install"\
+ -DCMAKE_CXX_COMPILER=CC\
+ -DKokkos_ENABLE_OPENMP=ON\
+ -DKokkos_ENABLE_SERIAL=ON\
+ -DKokkos_ARCH_ZEN3=ON\
+ -DKokkos_ARCH_AMPERE80=ON\
+ -DKokkos_ENABLE_CUDA=ON\
+ -DKokkos_ENABLE_AGGRESSIVE_VECTORIZATION=ON\
+ -DKokkos_ENABLE_TESTS=OFF\
+ -DBUILD_TESTING=OFF\
+ -DKokkos_ENABLE_CUDA_LAMBDA=ON\
+ -DKokkos_ENABLE_IMPL_DESUL_ATOMICS=OFF\
+ -DCMAKE_CXX_STANDARD=17\
+ -DCMAKE_EXE_LINKER_FLAGS=-no-gcc-rpath\
+ ..
+
+make -j16 -l16 install
+
+

(The -no-gcc-rpath linker flag is to work around a bug in the +post-SlingShot11 compiler environment on Polaris.)

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/programming-models/openmp-polaris/index.html b/polaris/programming-models/openmp-polaris/index.html new file mode 100644 index 0000000000..b567dc78a0 --- /dev/null +++ b/polaris/programming-models/openmp-polaris/index.html @@ -0,0 +1,7165 @@ + + + + + + + + + + + + + + + + + + + + + + + + + OpenMP - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

OpenMP

+

Overview

+

The OpenMP API is an open standard for parallel programming. The specification document can be found here: https://www.openmp.org. The specification describes directives, runtime routines, and environment variables that allow an application developer to express parallelism (e.g. shared memory multiprocessing and device offloading). Many compiler vendors provide implementations of the OpenMP specification (https://www.openmp.org/specifications).

+

Setting the environment to use OpenMP on Polaris

+

Many of the programming environments available on Polaris have OpenMP support.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
moduleOpenMP CPU support?OpenMP GPU support?
PrgEnv-nvhpcyesyes
llvmyesyes
PrgEnv-gnuyesno
PrgEnv-crayyesyes*
+

*Currently PrgEnv-cray is not recommended for OpenMP offload.

+

By default, the PrgEnv-nvhpc module is loaded. To switch to other modules, you can use module switch.

+

Using PrgEnv-nvhpc

+

This is loaded by default, so there's no need to load additional modules. You can confirm that it is loaded by running module list to check that PrgEnv-nvhpc is in the list.

+

Using LLVM

+

To use the LLVM module, load the following. +

module load mpiwrappers/cray-mpich-llvm
+module load cudatoolkit-standalone
+

+

See the the LLVM compiling page here for more information.

+

Using PrgEnv-gnu

+

To switch from PrgEnv-nvhpc to PrgEnv-gnu you can run:

+
module switch PrgEnv-nvhpc PrgEnv-gnu
+
+

The gcc/gfortran on Polaris was not built with GPU support. To use OpenMP on the CPU, you need to unload craype-accel-nvidia80:

+
module unload craype-accel-nvidia80
+
+

Using PrgEnv-cray

+

To switch from PrgEnv-nvhpc to PrgEnv-cray you can run:

+
module switch PrgEnv-nvhpc PrgEnv-cray
+
+

To use OpenMP on the CPU only, also unload craype-accel-nvidia80:

+
module unload craype-accel-nvidia80
+
+

To use OpenMP on the GPU, load cudatoolkit-standalone, although this is not recommended at the moment. +

module load cudatoolkit-standalone
+

+

Building on Polaris

+

The following table shows what compiler and flags to use with which PrgEnv:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
modulecompilerflags
PrgEnv-nvhpccc/CC/ftn (nvc/nvc++/nvfortran)-mp=gpu -gpu=cc80
llvmmpicc/mpicxx (clang/clang++)-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
PrgEnv-gnucc/CC/ftn (gcc/g++/gfortran)-fopenmp
PrgEnv-craycc/CC/ftn-fopenmp
+

For example to compile a simple code hello.cpp:

+

For PrgEnv-nvhpc, after loading the modules as discussed above we would use:

+
CC -mp=gpu -gpu=cc80 hello.cpp
+ftn -mp=gpu -gpu=cc80 hello.F90
+
+

For LLVM, after loading the modules as discussed above:

+
mpicxx -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda hello.cpp 
+
+

For PrgEnv-gnu, after loading the modules as discussed above we would use:

+
CC -fopenmp hello.cpp
+ftn -fopenmp hello.F90
+
+

For PrgEnv-cray, after loading the modules as discussed above we would use:

+
CC -fopenmp hello.cpp
+ftn -fopenmp hello.F90
+
+

Running on Polaris

+

To run, you can run the produced executable or with mpiexec in a job script, and then submit the script to the Polaris queue, like:

+
$ cat submit.sh
+#!/bin/sh
+#PBS -l select=1:system=polaris
+#PBS -l walltime=0:30:00
+#PBS -q debug 
+#PBS -A Catalyst
+#PBS -l filesystems=home:eagle
+
+cd ${PBS_O_WORKDIR}
+ mpiexec -n 1 ./executable
+$ # submit to the queue:
+$ qsub -l select=1:system=polaris -l walltime=0:30:00 -l filesystems=home:eagle -q debug -A Catalyst ./submit.sh
+
+

In the above, having the PBS options in the script and on the command line is redundant, but we put it there to show both ways of launching. This submits the script to one node in the debug queue on Polaris, requesting 30 min and the eagle and home filesystems. It will charge project Catalyst for the time.

+

More details for setting up the job script are in Job Scheduling and Execution section.

+

Example

+
$ cat hello.cpp
+#include <stdio.h>
+#include <omp.h>
+
+int main( int argv, char** argc ) {
+
+  printf( "Number of devices: %d\n", omp_get_num_devices() );
+
+  #pragma omp target
+  {
+    if( !omp_is_initial_device() )
+      printf( "Hello world from accelerator.\n" );
+    else
+      printf( "Hello world from host.\n" );
+  }
+  return 0;
+}
+
+$ cat hello.F90
+program  main
+  use omp_lib
+  implicit none
+  integer flag
+
+  write(*,*) "Number of devices:", omp_get_num_devices()
+
+  !$omp target map(from:flag)
+    if( .not. omp_is_initial_device() ) then
+      flag = 1
+    else
+      flag = 0
+   endif
+  !$omp end target
+
+   if( flag == 1 ) then
+      print *, "Hello world from accelerator"
+   else
+      print *, "Hello world from host"
+   endif
+
+ end program main
+
+$ # To compile
+$ CC -mp=gpu -gpu=cc80 hello.cpp -o c_test
+$ ftn -mp=gpu -gpu=cc80 hello.F90 -o f_test
+
+$ # To run 
+$ mpiexec -n 1 ./c_test
+Number of devices: 4
+Hello world from accelerator.
+$ mpiexec -n 1 ./f_test
+ Number of devices:            4
+ Hello world from accelerator
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/programming-models/sycl-polaris/index.html b/polaris/programming-models/sycl-polaris/index.html new file mode 100644 index 0000000000..43455aff5a --- /dev/null +++ b/polaris/programming-models/sycl-polaris/index.html @@ -0,0 +1,7103 @@ + + + + + + + + + + + + + + + + + + + + + + + + + SYCL - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

SYCL

+
+

SYCL (pronounced ‘sickle’) is a royalty-free, cross-platform abstraction layer that enables code for heterogeneous processors to be written using standard ISO C++ with the host and kernel code for an application contained in the same source file.

+
+ +
module load oneapi/upstream
+
+
+

Note

+
+

This module (compilers, libraries) gets built periodically from the latest open-source rather than releases. For more details on the release version of compiler, please find the details here. As such, these compilers will get new features and updates quickly that may break on occasion. Please submit any issues at the respective github repositories for the compilers and libraries.

+

Components

+
    +
  • These are the list of components associated with this module
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + +
User ApplicationComponent
CompilersDPC++
oneMKL InterfacesoneMKL
oneDPLoneDPL
SYCLomatic/DPCTdpct
+

Dependencies

+
    +
  • SYCL programming model is supported through oneapi compilers that were built from source-code
  • +
  • Loading this module switches the default programming environment to GNU and with the following dependencies
  • +
  • PrgEnv-gnu
  • +
  • cudatoolkit-standalone
  • +
  • Environment variable is set when loading the module: ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu
  • +
+

Example (memory intilization)

+
#include <sycl/sycl.hpp>
+
+int main(){
+    const int N= 100;
+    sycl::queue Q;
+    float *A = sycl::malloc_shared<float>(N, Q);
+
+    std::cout << "Running on "
+              << Q.get_device().get_info<sycl::info::device::name>()
+              << "\n";
+
+    // Create a command_group to issue command to the group
+    Q.parallel_for(N, [=](sycl::item<1> id) { A[id] = 0.1 * id; }).wait();
+
+    for (size_t i = 0; i < N; i++)
+        std::cout << "A[ " << i << " ] = " << A[i] << std::endl;
+    return 0;
+}
+
+

Compile and Run +

$ clang++ -std=c++17 -sycl-std=2020 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 main.cpp
+$ ./a.out
+

+

Example (using GPU-aware MPI)

+
#include <stdlib.h>
+#include <stdio.h>
+#include <mpi.h>
+
+#include <sycl/sycl.hpp>
+
+// Modified from NERSC website:
+// https://docs.nersc.gov/development/programming-models/mpi
+int main(int argc, char *argv[]) {
+
+    int myrank, num_ranks;
+    double *val_device;
+    double *val_host;
+    char machine_name[MPI_MAX_PROCESSOR_NAME];
+    int name_len=0;
+
+    MPI_Init(&argc, &argv);
+    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
+    MPI_Comm_size(MPI_COMM_WORLD, &num_ranks);
+    MPI_Get_processor_name(machine_name, &name_len);
+
+    sycl::queue q{sycl::gpu_selector_v};
+
+    std::cout << "Rank #" << myrank << " runs on: " << machine_name
+              << ", uses device: "
+              << q.get_device().get_info<sycl::info::device::name>() << "\n";
+
+    MPI_Barrier(MPI_COMM_WORLD);
+    int one=1;
+    val_host = (double *)malloc(one*sizeof(double));
+    val_device = sycl::malloc_device<double>(one,q);
+
+    const size_t size_of_double = sizeof(double);
+    *val_host = -1.0;
+    if (myrank != 0) {
+        std::cout << "I am rank " << myrank
+                  << " and my initial value is: " << *val_host << "\n";
+    }
+
+    if (myrank == 0) {
+        *val_host = 42.0;
+        q.memcpy(val_device,val_host,size_of_double).wait();
+        std::cout << "I am rank " << myrank
+                  << " and will broadcast value: " << *val_host << "\n";
+    }
+
+    MPI_Bcast(val_device, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
+
+    double check = 42.0;
+    if (myrank != 0) {
+        //Device to Host
+        q.memcpy(val_host,val_device,size_of_double).wait();
+        assert(*val_host == check);
+        std::cout << "I am rank " << myrank
+                  << " and received broadcast value: " << *val_host << "\n";
+    }
+
+    sycl::free(val_device,q);
+    free(val_host);
+
+    MPI_Finalize();
+
+    return 0;
+}
+
+

Load Modules

+
module load oneapi
+module load mpiwrappers/cray-mpich-oneapi
+export MPICH_GPU_SUPPORT_ENABLED=1
+
+

Compile and Run

+

$ mpicxx -L/opt/cray/pe/mpich/8.1.16/gtl/lib -lmpi_gtl_cuda -std=c++17 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 main.cpp
+$ mpiexec -n 2 --ppn 2 --depth=1 --cpu-bind depth ./set_affinity_gpu_polaris.sh ./a.out
+
+For further details regarding the arguments passed to mpiexec command shown above, please visit the Job Scheduling and Execution section. A simple example describing the details and execution of the set_affinity_gpu_polaris.sh file can be found here.

+

Note: By default, there is no GPU-aware MPI library linking support. The example above shows how the user can enable the linking by specifying the path to the GTL (GPU Transport Layer) library (libmpi_gtl_cuda) to the link line.

+

oneAPI Math Kernel Library (oneMKL) Interfaces

+

oneMKL Interfaces is an open-source implementation of the oneMKL Data Parallel C++ (DPC++) interface according to the oneMKL specification. It works with multiple devices (backends) using device-specific libraries underneath.

+

oneMKL is part of oneAPI. Various backend supported are shown below. More Information here.

+ + + + + + + + + + + + + + + + + + + + + +
User ApplicationThird-Party Library
cuBLAS
oneMKL interfacecuSOLVER
cuRAND
+

Example (using onemkl::gemm)

+

The following snippet shows how to compile and run a SYCL code with oneMKL library. For instance, a GPU-based GEMM is performed using mkl::gemm API and the results are compared to a CPU-based GEMM performed using the traditional blas (e.g., AOCL-BLIS) library. +

#include <limits>
+#include <random>
+
+#include <sycl/sycl.hpp>
+
+#include <oneapi/mkl.hpp>  // ONEMKL GPU header
+#include <cblas.h>         // BLIS   CPU header
+
+// Matrix size constants
+#define SIZE 4800 // Must be a multiple of 8.
+#define M SIZE / 8
+#define N SIZE / 4
+#define P SIZE / 2
+
+//////////////////////////////////////////////////////////////////////////////////////////
+
+bool ValueSame(double a, double b) { return std::fabs(a - b) < 1.0e-08; }
+int VerifyResult(double *c_A, double *c_B) {
+  bool MismatchFound = false;
+
+  for (size_t i = 0; i < M; i++) {
+    for (size_t j = 0; j < P; j++) {
+      if (!ValueSame(c_A[i * P + j], c_B[i * P + j])) {
+        std::cout << "fail - The result is incorrect for element: [" << i << ", " << j
+                  << "], expected: " << c_A[i * P + j] << " , but got: " << c_B[i * P + j]
+                  << std::endl;
+        MismatchFound = true;
+      }
+    }
+  }
+
+  if (!MismatchFound) {
+    std::cout << "SUCCESS - The results are correct!" << std::endl;
+    return 0;
+  } else {
+    std::cout << "FAIL - The results mis-match!" << std::endl;
+    return -1;
+  }
+}
+
+//////////////////////////////////////////////////////////////////////////////////////////
+
+int main() {
+  std::random_device rd;  // Will be used to obtain a seed for the random number engine
+  std::mt19937 gen(rd()); // Standard mersenne_twister_engine seeded with rd()
+  std::uniform_real_distribution<> dis(1.0, 2.0);
+
+  // C = alpha * op(A) * op(B)  + beta * C
+  oneapi::mkl::transpose transA = oneapi::mkl::transpose::nontrans;
+  oneapi::mkl::transpose transB = oneapi::mkl::transpose::nontrans;
+
+  // matrix data sizes
+  int m = M;
+  int n = P;
+  int k = N;
+
+  // leading dimensions of data
+  int ldA = k;
+  int ldB = n;
+  int ldC = n;
+
+  // set scalar fp values
+  double alpha = 1.0;
+  double beta = 0.0;
+
+  // 1D arrays on host side
+  double *A;
+  double *B;
+  double *C_host_onemkl, *C_cblas;
+
+  A = new double[M * N]{};
+  B = new double[N * P]{};
+  C_cblas = new double[M * P]{};
+  C_host_onemkl = new double[M * P]{};
+
+  // prepare matrix data with ROW-major style
+  // A(M, N)
+  for (size_t i = 0; i < M; i++)
+    for (size_t j = 0; j < N; j++)
+      A[i * N + j] = dis(gen);
+  // B(N, P)
+  for (size_t i = 0; i < N; i++)
+    for (size_t j = 0; j < P; j++)
+      B[i * P + j] = dis(gen);
+
+  std::cout << "Problem size: c(" << M << "," << P << ") = a(" << M << "," << N << ") * b(" << N
+            << "," << P << ")" << std::endl;
+
+  // Resultant matrix: C_cblas
+  cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, A, ldA, B, ldB, beta,
+              C_cblas, ldC);
+
+  // Resultant matrix: C_onemkl
+  sycl::queue q(sycl::property_list{sycl::property::queue::in_order{}});
+  std::cout << "Device: " << q.get_device().get_info<sycl::info::device::name>() << std::endl << std::endl;
+
+  double* A_dev        = sycl::malloc_device<double>(M*N, q);
+  double* B_dev        = sycl::malloc_device<double>(N*P, q);
+  double* C_dev_onemkl = sycl::malloc_device<double>(M*P, q);
+
+  q.memcpy(A_dev, A, (M*N) * sizeof(double));
+  q.memcpy(B_dev, B, (N*P) * sizeof(double));
+
+  auto gemm_event = oneapi::mkl::blas::column_major::gemm(q, transB, transA, n, m, k, alpha, B_dev, ldB, A_dev, ldA, beta, C_dev_onemkl, ldC);
+
+  q.memcpy(C_host_onemkl, C_dev_onemkl, (M*P) * sizeof(double));
+
+  q.wait();
+  std::cout << "Verify results between OneMKL & CBLAS: ";
+  int result_cblas = VerifyResult(C_cblas, C_host_onemkl);
+
+  delete[] A;
+  delete[] B;
+  delete[] C_cblas;
+  delete[] C_host_onemkl;
+  sycl::free(A_dev, q);
+  sycl::free(B_dev, q);
+  sycl::free(C_dev_onemkl, q);
+  return result_cblas;
+}
+

+

Compile and Run

+

The user would need to provide paths the math-libraris as shown below. Also please provide AOCL library for CPU GEMM by module load aocl. +Environment variables MKLROOT is defined with oneapi module & AOCL_ROOT is defined with aocl module. +Note: Please pay attention to the linker options for AOCL & oneMKL libraries. +

$ clang++ -std=c++17 -sycl-std=2020 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 -L$AOCL_ROOT/lib -lblis -L$MKLROOT/lib -lonemkl sycl_onemkl_gemm.cpp -o sycl_onemkl_gemm.out
+

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/running-jobs/index.html b/polaris/running-jobs/index.html new file mode 100644 index 0000000000..67022b6837 --- /dev/null +++ b/polaris/running-jobs/index.html @@ -0,0 +1,7206 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Running Jobs - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Running Jobs on Polaris

+

Queues

+
+

There are five production queues you can target in your qsub (-q <queue name>):

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Queue NameNode MinNode MaxTime MinTime MaxNotes
debug125 min1 hrmax 8 nodes in use by this queue ay any given time
debug-scaling1105 min1 hrmax 1 job running/accruing/queued per-user
prod104965 min24 hrsRouting queue; See below
preemptable1105 min72 hrsmax 20 jobs running/accruing/queued per-project; see note below
demand1565 min1 hrBy request only; max 100 jobs running/accruing/queued per-project
+
+

Note: Jobs in the demand queue take priority over jobs in the preemptable queue. +This means jobs in the preemptable queue may be preempted (killed without any warning) if there are jobs in the demand queue. +Please use the following command to view details of a queue: qstat -Qf <queuename>

+

prod is routing queue and routes your job to one of the following six execution queues:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Queue NameNode MinNode MaxTime MinTime MaxNotes
small10245 min3 hrs
medium25995 min6 hrs
large1004965 min24 hrs
backfill-small10245 min3 hrslow priority, negative project balance
backfill-medium25995 min6 hrslow priority, negative project balance
backfill-large1004965 min24 hrslow priority, negative project balance
+
    +
  • Note 1: You cannot submit to these queues directly, you can only submit to the routing queue "prod".
  • +
  • Note 2: All of these queues have a limit of ten (10) jobs running/accruing per-project
  • +
  • Note 3: All of these queues have a limit of one hundred (100) jobs queued (not accruing score) per-project
  • +
  • Note 4: As of January 2023, it is recommended to submit jobs with a maximum node count of 476-486 nodes given current rates of downed nodes (larger jobs may sit in the queue indefinitely).
  • +
+

Running MPI+OpenMP Applications

+

Once a submitted job is running calculations can be launched on the compute nodes using mpiexec to start an MPI application. Documentation is accessible via man mpiexec and some helpful options follow.

+
    +
  • -n total number of MPI ranks
  • +
  • -ppn number of MPI ranks per node
  • +
  • --cpu-bind CPU binding for application
  • +
  • --depth number of cpus per rank (useful with --cpu-bind)
  • +
  • --env set environment variables (--env OMP_NUM_THREADS=2)
  • +
  • --hostfile indicate file with hostnames (the default is --hostfile $PBS_NODEFILE)
  • +
+

A sample submission script with directives is below for a 4-node job with 32 MPI ranks on each node and 8 OpenMP threads per rank (1 per CPU).

+
#!/bin/bash -l
+#PBS -N AFFINITY
+#PBS -l select=4:ncpus=256
+#PBS -l walltime=0:10:00
+#PBS -q debug-scaling
+#PBS -A Catalyst
+
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS=32 # Number of MPI ranks to spawn per node
+NDEPTH=8 # Number of hardware threads per rank (i.e. spacing between MPI ranks)
+NTHREADS=8 # Number of software threads per rank to launch (i.e. OMP_NUM_THREADS)
+
+NTOTRANKS=$(( NNODES * NRANKS ))
+
+echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS} THREADS_PER_RANK= ${NTHREADS}"
+
+cd /home/knight/affinity
+mpiexec --np ${NTOTRANKS} -ppn ${NRANKS} -d ${NDEPTH} --cpu-bind depth -env OMP_NUM_THREADS=${NTHREADS} ./hello_affinity
+
+

Running GPU-enabled Applications

+

GPU-enabled applications will similarly run on the compute nodes using the above example script. +- The environment variable MPICH_GPU_SUPPORT_ENABLED=1 needs to be set if your application requires MPI-GPU support whereby the MPI library sends and receives data directly from GPU buffers. In this case, it will be important to have the craype-accel-nvidia80 module loaded both when compiling your application and during runtime to correctly link against a GPU Transport Layer (GTL) MPI library. Otherwise, you'll likely see GPU_SUPPORT_ENABLED is requested, but GTL library is not linked errors during runtime. +- If running on a specific GPU or subset of GPUs is desired, then the CUDA_VISIBLE_DEVICES environment variable can be used. For example, if one only wanted an application to access the first two GPUs on a node, then setting CUDA_VISIBLE_DEVICES=0,1 could be used.

+

Binding MPI ranks to GPUs

+

The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank. One example is available here where each MPI rank is similarly bound to a single GPU with round-robin assignment.

+

A example set_affinity_gpu_polaris.sh script follows where GPUs are assigned round-robin to MPI ranks.

+

#!/bin/bash -l
+num_gpus=4
+# need to assign GPUs in reverse order due to topology
+# See Polaris Device Affinity Information:
+# https://www.alcf.anl.gov/support/user-guides/polaris/hardware-overview/machine-overview/index.html
+gpu=$((${num_gpus} - 1 - ${PMI_LOCAL_RANK} % ${num_gpus}))
+export CUDA_VISIBLE_DEVICES=$gpu
+echo “RANK= ${PMI_RANK} LOCAL_RANK= ${PMI_LOCAL_RANK} gpu= ${gpu}”
+exec "$@"
+
+This script can be placed just before the executable in the mpiexec command like so. +
mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth ./set_affinity_gpu_polaris.sh ./hello_affinity
+
+Users with different needs, such as assigning multiple GPUs per MPI rank, can modify the above script to suit their needs.

+

Interactive Jobs on Compute Nodes

+

Here is how to submit an interactive job to, for example, edit/build/test an application Polaris compute nodes: +

qsub -I -l select=1 -l filesystems=home:eagle -l walltime=1:00:00 -q debug
+

+

This command requests 1 node for a period of 1 hour in the debug queue, requiring access to the /home and eagle filesystems. After waiting in the queue for a node to become available, a shell prompt on a compute node will appear. You may then start building applications and testing gpu affinity scripts on the compute node.

+

NOTE: If you want to ssh or scp to one of your assigned compute nodes you will need to make sure your $HOME directory and your $HOME/.ssh directory permissions are both set to 700.

+

Running Multiple MPI Applications on a node

+

Multiple applications can be run simultaneously on a node by launching several mpiexec commands and backgrounding them. For performance, it will likely be necessary to ensure that each application runs on a distinct set of CPU resources and/or targets specific GPUs. One can provide a list of CPUs using the --cpu-bind option, which when combined with CUDA_VISIBLE_DEVICES provides a user with specifying exactly which CPU and GPU resources to run each application on. In the example below, four instances of the application are simultaneously running on a single node. In the first instance, the application is spawning MPI ranks 0-7 on CPUs 24-31 and using GPU 0. This mapping is based on output from the nvidia-smi topo -m command and pairs CPUs with the closest GPU.

+
export CUDA_VISIBLE_DEVICES=0
+mpiexec -n 8 --ppn 8 --cpu-bind list:24:25:26:27:28:29:30:31 ./hello_affinity &
+
+export CUDA_VISIBLE_DEVICES=1
+mpiexec -n 8 --ppn 8 --cpu-bind list:16:17:18:19:20:21:22:23 ./hello_affinity &
+
+export CUDA_VISIBLE_DEVICES=2
+mpiexec -n 8 --ppn 8 --cpu-bind list:8:9:10:11:12:13:14:15 ./hello_affinity &
+
+export CUDA_VISIBLE_DEVICES=3
+mpiexec -n 8 --ppn 8 --cpu-bind list:0:1:2:3:4:5:6:7 ./hello_affinity &
+
+wait
+
+

Compute Node Access to the Internet

+

Currently, the only access the internet is via a proxy. Here are the proxy environment variables for Polaris:

+
export http_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
+export https_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
+export ftp_proxy="http://proxy-01.pub.alcf.anl.gov:3128"
+
+

In the future, though we don't have a timeline on this because it depends on future features in slingshot and internal software development, we intend to have public IP addresses be a schedulable resource. For instance, if only your head node needed public access your select statement might looks something like: -l select=1:pubnet=True+63.

+

Controlling Where Your Job Runs

+

If you wish to have your job run on specific nodes form your select like this: -l select=1:vnode=<node name1>+1:vnode=<node name2>... . Obviously, that gets tedious for large jobs.

+

If you want to control the location of a few nodes, for example 2 out of 64, but the rest don't matter, you can do something like this: -l select=1:vnode=<node name1>+1:vnode=<node name2>+62:system=foo

+

Every node has a PBS resource called tier0 with a rack identifier and tier1 with a dragonfly group identifieer. If you want all your nodes grouped in a rack, you can add the group specifier -l select=8:system=foo,place=scatter:group=tier0. If you wanted everything in the same dragonfly group, replace tier0 with tier1. Note that you have to also explicitly specify the place when you use group. If you wanted a specific rack or dragonfly group instead of any of them, you are back to the select: -l select 10:tier0=x3001-g0.

+

Network: Rack and Dragonfly Group Mappings

+
    +
  • Racks contain (7) 6U chassis; each chassis has 2 nodes for 14 nodes per rack
  • +
  • The hostnames are of the form xRRPPc0sUUb[0|1]n0 where:
      +
    • RR is the row {30, 31, 32}
    • +
    • PP is the position in the row {30 goes 1-16, 31 and 32 go 1-12}
    • +
    • c is chassis and is always 0
    • +
    • s stands for slot, but in this case is the RU in the rack and values are {1,7,13,19,25,31,37}
    • +
    • b is BMC controller and is 0 or 1 (each node has its own BMC)
    • +
    • n is node, but is always 0 since there is only one node per BMC
    • +
    +
  • +
  • So, 16+12+12 = 40 racks * 14 nodes per rack = 560 nodes.
  • +
  • Note that in production group 9 (the last 4 racks) will be the designated on-demand racks
  • +
  • The management racks are x3000 and X3100 and are dragonfly group 10
  • +
  • The TDS rack is x3200 and is dragonfly group 11
  • +
  • Each compute node will have a PBS resource named tier0 which will be equal to the values in the table below. This allows you to group your jobs within a rack if you wish. There is also a resource called tier1 which will be equal to the column headings. This allows you to group your jobs within a dragonfly group if you wish.
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
g0g1g2g3g4g5g6g7g8g9
x3001-g0x3005-g1x3009-g2x3013-g3x3101-g4x3105-g5x3109-g6x3201-g7x3205-g8x3209-g9
x3002-g0x3006-g1x3010-g2x3014-g3x3102-g4x3106-g5x3110-g6x3202-g7x3206-g8x3210-g9
x3003-g0x3007-g1x3011-g2x3015-g3x3103-g4x3107-g5x3111-g6x3203-g7x3207-g8x3211-g9
x3004-g0x3008-g1x3012-g2x3016-g3x3104-g4x3108-g5x3112-g6x3204-g7x3208-g8x3212-g9
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/visualization/ffmpeg/index.html b/polaris/visualization/ffmpeg/index.html new file mode 100644 index 0000000000..d379d30b93 --- /dev/null +++ b/polaris/visualization/ffmpeg/index.html @@ -0,0 +1,6699 @@ + + + + + + + + + + + + + + + + + + + + + + + + + FFmpeg - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

FFmpeg on Polaris

+

To use FFmpeg on Polaris first load the corresponding module:

+
module load ffmpeg
+
+

This is a typical command line to create a movie from a series of snapshots in PNG format:

+
ffmpeg -r 15 -i frames.%03d.png -r 25 -pix_fmt yuv420p movie.mp4
+
+

where:

+
-r 15 is the input frame rate. Experiment with values smaller than the output frame rate for longer movies.
+-r 25 is the output frame rate (use this value for standard 25 frames per second)
+-i frames.%03d.png reads the input frames in sequence
+-pix_fmt yuv420p is needed for movies to play in browsers
+movie.mp4 is the resulting movie
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/visualization/imagemagick/index.html b/polaris/visualization/imagemagick/index.html new file mode 100644 index 0000000000..8f5a118ebb --- /dev/null +++ b/polaris/visualization/imagemagick/index.html @@ -0,0 +1,6689 @@ + + + + + + + + + + + + + + + + + + + + + + + + + ImageMagick - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

ImageMagick on Polaris

+

To use ImageMagick on Polaris first load the corresponding module:

+
module load imagemagick
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/visualization/images/ParaviewChooseServerConfig.png b/polaris/visualization/images/ParaviewChooseServerConfig.png new file mode 100644 index 0000000000..22f52543c3 Binary files /dev/null and b/polaris/visualization/images/ParaviewChooseServerConfig.png differ diff --git a/polaris/visualization/images/ParaviewConnectMenu.png b/polaris/visualization/images/ParaviewConnectMenu.png new file mode 100644 index 0000000000..bffbe0f59b Binary files /dev/null and b/polaris/visualization/images/ParaviewConnectMenu.png differ diff --git a/polaris/visualization/images/ParaviewConnected.png b/polaris/visualization/images/ParaviewConnected.png new file mode 100644 index 0000000000..6c25ae015b Binary files /dev/null and b/polaris/visualization/images/ParaviewConnected.png differ diff --git a/polaris/visualization/images/ParaviewConnectionOptions.png b/polaris/visualization/images/ParaviewConnectionOptions.png new file mode 100644 index 0000000000..b5ec08778b Binary files /dev/null and b/polaris/visualization/images/ParaviewConnectionOptions.png differ diff --git a/polaris/visualization/images/ParaviewFetchServers.png b/polaris/visualization/images/ParaviewFetchServers.png new file mode 100644 index 0000000000..f32c08d5cd Binary files /dev/null and b/polaris/visualization/images/ParaviewFetchServers.png differ diff --git a/polaris/visualization/images/ParaviewWaitForServer.png b/polaris/visualization/images/ParaviewWaitForServer.png new file mode 100644 index 0000000000..efd6a3bfec Binary files /dev/null and b/polaris/visualization/images/ParaviewWaitForServer.png differ diff --git a/polaris/visualization/images/connect-icon.png b/polaris/visualization/images/connect-icon.png new file mode 100644 index 0000000000..f73fea9ff7 Binary files /dev/null and b/polaris/visualization/images/connect-icon.png differ diff --git a/polaris/visualization/paraview/index.html b/polaris/visualization/paraview/index.html new file mode 100644 index 0000000000..2d3b25d083 --- /dev/null +++ b/polaris/visualization/paraview/index.html @@ -0,0 +1,6917 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Paraview - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+ +
+ + + +
+
+ + + + + + + +

Paraview on Polaris

+

The recommended way of running ParaView on Polaris is in client/server mode. This consists of running the ParaView client on your local resource, and the ParaView server on the Polaris compute nodes. The ParaView client needs to first be installed on your local resource, and needs to match the version that you run on Polaris.

+

There are multiple versions of Paraview installed on Polaris. To find the versions of Paraview currently available on Polaris run the following command on a login node: +

module avail paraview
+

+

Binary and source packages of the Paraview client for Linux, MacOS, and Windows are available from the ParaView Download Page.

+

Connecting to the Paraview server on Polaris

+

This section describes how to launch the Paraview server on Polaris from a local ParaView client.

+

Start ParaView Client

+

First, launch the ParaView client on your local resource. You will need to configure some server settings in the client. This initial set up should only need to be done once, and can be reused each time you want to run ParaView on Polaris.

+

Server Configuration

+

1. Select Connect

+

From the ParaView client choose to connect to a server by either clicking on the "Connect" icon in the menu bar

+

Connect icon

+

or selecting File->Connect from the main menu

+
+

Select connect

+
+

2. Set Up Servers (first time only)

+

The first time you want to run a server on Polaris and have it connect to your local ParaView client, you will need to set up a Server. Once this server is set up, you can reuse it each time you run the ParaView client with the Paraview server on Polaris.

+

Kitware, the developers of ParaView, maintain a database of server configurations which you can retrieve through the ParaView client. In the File->Connect menu press the button named "Fetch Servers" and select POLARIS@ANL. Windows users should select "windows to POLARIS@ANL". Press "Import Selected"

+
+

Load servers

+
+

3. Use Paraview

+

After the previous step, you can now select POLARIS@ANL in the File->Connect menu and press Connect

+
+

Load servers

+
+

At this point a new window will pop up

+
+

Load servers

+
+

There are a number of parameters that you must enter manually here:

+

Xterm executable: the path of a terminal on your system. The figure shows the case of a Mac with XQuartz. You may need to change these values for Windows or Linux.

+

SSH executable: the name of your ssh command. It may be different on Windows depending on the ssh client installed (i.e putty)

+

Remote machine: leave this value at polaris.alcf.anl.gov

+

Username: your ALCF user name

+

ParaView version: the version of Paraview that you want to use. Verify first that this version is installed on the system (as described at the top of this document). You will also need to add a -mesa suffix.

+

Example: +

5.11.2-mesa
+

+

Client port: it is safe to use the default value

+

Server port: is is safe to use the default value

+

Number of nodes to reserve: enter the number of Polaris compute nodes you want to use for your job

+

Number of ranks per node: enter the number of ranks per node

+

Number of minutes to reserve: the duration of your job in minutes

+

Account: enter here the name of your ALCF allocation

+

Queue: the name of the Polaris queue you would like to use (i.e: debug for small, quick jobs, prod, preemptable)

+

File Systems: enter here the file systems you need for your job, separated with colons, no spaces. Keep in mind that your job may not run if one of these file systems is not available at that time, so enter these values carefully

+

Job name: safe to use default value. The PBS scheduler will assign this name to your job

+

Now you can press OK to establish the connection with a Paraview server on Polaris.

+

An ssh connection will be established with a Polaris login node and a password will be requested in a terminal, similar to the process you normally use to connect and work on the system.

+

After you enter your password, a job will be queued and you will see a window like this:

+
+

Load servers

+
+

When the job is launched on the compute nodes, the previous window will go away and Paraview will show it is connected to Polaris in its Pipeline Browser:

+
+

Load servers

+
+

At this point you can open datasets stored on the ALCF file systems and use Paraview normally.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/visualization/scripts/host_anl_polaris.xml b/polaris/visualization/scripts/host_anl_polaris.xml new file mode 100644 index 0000000000..f91e081279 --- /dev/null +++ b/polaris/visualization/scripts/host_anl_polaris.xml @@ -0,0 +1,103 @@ + + + "ANL Polaris" + polaris.alcf.anl.gov + notset + polaris-login-0# + /soft/visualization/visit + false + false + 22 + false + "ssh" + false + + MachineName + + true + false + 1 + false + 1 + + 480 + 4 + true + 1 + true + debug + true + visualization + true + 1:00:00 + true + qsub/aprun + true + false + false + + true + true + + false + + false + + false + + true + $PBS_NODEFILE + true + false + 1 + + false + :%l + 0 + false + + + parallel + + + 480 + 1 + false + 1 + false + + false + + false + + false + + true + false + false + + false + false + + false + + false + + false + + true + + false + false + 1 + + false + :%l + 0 + false + + + serial + + 1 + diff --git a/polaris/visualization/scripts/server_polaris.pvsc b/polaris/visualization/scripts/server_polaris.pvsc new file mode 100644 index 0000000000..e94f5af7ca --- /dev/null +++ b/polaris/visualization/scripts/server_polaris.pvsc @@ -0,0 +1,73 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/visualization/scripts/server_polaris_windows.pvsc b/polaris/visualization/scripts/server_polaris_windows.pvsc new file mode 100644 index 0000000000..635dedbc03 --- /dev/null +++ b/polaris/visualization/scripts/server_polaris_windows.pvsc @@ -0,0 +1,73 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/visualization/visit/index.html b/polaris/visualization/visit/index.html new file mode 100644 index 0000000000..917d3551de --- /dev/null +++ b/polaris/visualization/visit/index.html @@ -0,0 +1,6810 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Visit - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Visit on Polaris

+

Getting Started

+

The latest Visit versions installed on Polaris are 3.3.3 and 3.4.0.

+

Please note that at the time of this writing Visit version 3.4.0 does not yet have a client for Mac available.

+

Follow these steps to install Visit on your local machine:

+
    +
  • Download and install Visit for your local platform (MacOS, Windows, Linux). The version you download must match the server version installed on Polaris. Use this page
  • +
  • Download the Polaris host profile for VisIt (you may need to right-click and choose "Save link as..." or "Save target as...")
  • +
  • Copy this file to a file called ~/.visit/hosts/host_anl_polaris.xml on Mac or Linux. [ We need to also specify this path for for Windows]
  • +
+

Note: Visit allows the user to download host profiles for ANL, but all these settings are outdated. We are working with the Visit developers to update the ANL host list.

+

Additional information for using VisIt in client/server mode here

+

Running VisIt

+
    +
  • Start up VisIt on your local machine
  • +
  • Click File -> Open File and choose "ANL Polaris" from the "Host" dropdown
  • +
  • You'll be prompted for your password; enter your ALCF authenticator app response
  • +
  • When you open a selected file, it will launch a job on Polaris
      +
    • You will need to specify the "Bank" (Project) to use when VisIt submits jobs to the queue on Polaris. Specify a project in the Options box.
    • +
    • If your environment doesn't get sourced correctly with non-interactive SSH, you can set the default project to use under Options -> Host profiles
    • +
    • Note: Don't change the contents of the "Machine file" field (it should be $PBS_NODEFILE)
    • +
    • Note: The default Launch Profile is set to serial. We recommend leaving this setting in its default value, but using the parallel method to launch jobs on Polaris.
    • +
    • Note: Don't change the contents of "launchMethod". It must be qsub/aprun even though Polaris does not use aprun.
    • +
    • If you'd like to change other job parameters (like the number of processes, nodes, and walltime), you can do so. Please enter time in the format required by the PBS scheduler (i.e 1:00:00 for one hour)
    • +
    • If you'd like these changes to be used as your default, be sure to save them using Save Settings under the Options menu.
    • +
    +
  • +
+

Additional Information

+ + + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/visualization/visualization/index.html b/polaris/visualization/visualization/index.html new file mode 100644 index 0000000000..f901caf60c --- /dev/null +++ b/polaris/visualization/visualization/index.html @@ -0,0 +1,6692 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Visualization on Polaris - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Visualization on Polaris

+

Starting in January 2024, Polaris will serve as the primary production resource for visualization and analysis.

+

Below is a list of the available visualization tools along with links to their corresponding documentation.

+

ParaView: ParaView is an open-source visualization engine that seamlessly integrates with your existing tools and workflows. It allows you to construct visualization pipelines for quick data analysis. Whether interactively exploring large datasets in 3D or performing batch processing programmatically, ParaView provides versatile capabilities. For additional information, visit the Kitware website.

+

VisIt: VisIt is an open-source, interactive, and scalable visualization, animation, and analysis tool. Users can rapidly generate visualizations, animate them over time, apply various operators and mathematical expressions, and save resulting images and animations for presentations. VisIt supports a diverse range of visualization features, enabling users to view data, including scalar and vector fields, on 2D and 3D structured, adaptive, and unstructured meshes. Thanks to its customizable plugin design, VisIt can visualize data from over 120 different scientific data formats. For more information, check the VisIt project GitHub page.

+

FFmpeg: FFmpeg is a complete solution to record, convert, and stream audio and video. For more information, visit the FFmpeg webpage

+

ImageMagick: ImageMagick is a free, open-source software suite, used for editing and manipulating digital images. It can be used to create, edit, compose, or convert bitmap images, and supports a wide range of file formats, including JPEG, PNG, GIF, TIFF, and PDF. More information in the ImageMagick webpage.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/workflows/balsam/index.html b/polaris/workflows/balsam/index.html new file mode 100644 index 0000000000..1807bfe4ea --- /dev/null +++ b/polaris/workflows/balsam/index.html @@ -0,0 +1,6703 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Balsam - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Balsam

+

Balsam is a Python-based workflow manager that helps users execute large numbers of jobs, potentially with interjob dependencies, track job outcomes, and manage postprocessing analysis. A Balsam Site runs on a node with access to the job scheduler, where it can submit and monitor jobs. Overall job state is aggregated on the Balsam Server, making job data from all Sites accessible from any individual site (or the user's laptop), via the command-line interface or the Python API. To get information on how to use the command line tool, you can type balsam --help in your shell.

+

Full documentation for Balsam is available online.

+

Balsam requires Python 3.7+. To install Balsam on Polaris, first set up a virtual Python environment:

+
module load conda
+conda activate base
+python -m venv env
+source env/bin/activate
+pip install --upgrade pip
+pip install --pre balsam
+
+

To use Balsam, users need an account on the Balsam server. Users can get an account by contacting the ALCF Help Desk. Once a user has an account, they can login and make a new site. A Balsam site is a project space for your workflow. You will be prompted to select what machine (Polaris) you are working on when creating a new site:

+
balsam login
+balsam site init -n new-site new-site
+cd new-site
+balsam site start
+
+

See the Balsam documentation for full details.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/workflows/libensemble/index.html b/polaris/workflows/libensemble/index.html new file mode 100644 index 0000000000..80af1c29a1 --- /dev/null +++ b/polaris/workflows/libensemble/index.html @@ -0,0 +1,6863 @@ + + + + + + + + + + + + + + + + + + + + + + + + + libEnsemble - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

libEnsemble

+

libEnsemble is a Python toolkit for running dynamic ensembles of calculations.

+

Users provide generator and simulator functions to express their ensembles, where the generator can steer the ensemble based on previous results. These functions can portably submit external executables at any scale.

+

System details are detected, and dynamic resource management is provided. This includes automatically detecting, assigning, and reassigning +GPUs for ensemble members.

+

libEnsemble can be used in a consistent manner on laptops, clusters, and supercomputers with minimal required dependencies.

+

Getting libEnsemble on Polaris

+

libEnsemble is provided on Polaris in the conda module:

+
module load conda
+conda activate base
+
+ +

See the docs for more details on using python on Polaris.

+
+ Example: creating virtual environment and updating libEnsemble + + E.g., to create a virtual environment that allows installation of + further packages with pip: + +
python -m venv /path/to-venv --system-site-packages
+. /path/to-venv/bin/activate
+
+ + Where ``/path/to-venv`` can be anywhere you have write access. + For future uses just load the conda module and run the activate line. + + You can also ensure you are using the latest version of libEnsemble: + +
pip install libensemble
+
+
+ +

libEnsemble examples

+

For a very simple example of using libEnsemble see the Simple Introduction tutorial

+

For an example that runs a small ensemble using a C application (offloading work to the GPU), see +the GPU app tutorial. +The required files for this tutorial can be found +in this directory. +A video demo is also available.

+

Job Submission

+

libEnsemble runs on the compute nodes on Polaris using either Python's +multiprocessing or mpi4py. The user can set the number of workers for +maximum concurrency. libEnsemble will detect the nodes available +from the PBS environment and use these for running simulations. Polaris supports +running multiple concurrent simulations on each node if desired.

+

A simple example batch script for a libEnsemble use case that runs five workers on one node:

+
    #!/bin/bash -l
+    #PBS -l select=1:system=polaris
+    #PBS -l walltime=00:15:00
+    #PBS -l filesystems=home:grand
+    #PBS -q debug
+    #PBS -A <myproject>
+
+    export MPICH_GPU_SUPPORT_ENABLED=1
+    cd $PBS_O_WORKDIR
+    python run_libe_forces.py --comms local --nworkers 5
+
+

The script can be run with:

+
qsub submit_libe.sh
+
+ +

Or you can run an interactive session with:

+
qsub -A <myproject> -l select=1 -l walltime=15:00 -lfilesystems=home:grand -qdebug -I
+
+ + +

Docs: https://libensemble.readthedocs.io
+GitHub: https://github.com/Libensemble/libensemble

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/workflows/mig-compute/index.html b/polaris/workflows/mig-compute/index.html new file mode 100644 index 0000000000..4ddff1434b --- /dev/null +++ b/polaris/workflows/mig-compute/index.html @@ -0,0 +1,6893 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Multi-Instance GPU (MIG) mode - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Multi-Instance GPU (MIG) mode

+

MIG mode can be enabled and configured on Polaris by passing a valid configuration file to qsub:

+
+

qsub ... -l mig_config=/home/ME/path/to/mig_config.json ...

+
+

You can find a concise explanation of MIG concepts and terms at https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#concepts

+

Configuration

+

Please study the following example of a valid configuration file:

+
{
+  "group1": {
+    "gpus": [0,1],
+    "mig_enabled": true,
+    "instances": {"7g.40gb": ["4c.7g.40gb", "3c.7g.40gb"] }
+  },
+  "group2": {
+    "gpus": [2,3],
+    "mig_enabled": true,
+    "instances": {"3g.20gb": ["2c.3g.20gb", "1c.3g.20gb"], "2g.10gb": ["2g.10gb"], "1g.5gb": ["1g.5gb"], "1g.5gb": ["1g.5gb"]}
+  }
+}
+
+

Notes

+
    +
  • Group names are arbitrary, but must be unique
  • +
  • "gpus" must be an array of integers. if only one physical gpu is being configured in a group, it must still be contained within an array(ex. "gpus": [0],)
  • +
  • Only groups with mig_enabled set to true will be configured
  • +
  • instances denote the MIG gpu instances and the nested compute instances you wish to be configured
  • +
  • syntax is {"gpu instance 1": ["cpu instance 1", "cpu instance 2"], ...}
  • +
  • valid gpu instances are 1g.5gb, 1g.10gb, 2g.10gb, 3g.20gb, 4g.20gb, and 7g.40gb. the first number denotes the number of slots used out of 7 total, and the second number denotes memory in GB
  • +
  • the default cpu instance for any gpu instance has the same identifier as the gpu instance(in which case it will be the only one configurable)
  • +
  • other cpu instances can be configured with the identifier syntax Xc.Y, where X is the number of slots available in that gpu instance, and Y is the gpu instance identifier string
  • +
  • some gpu instances cannot be configured adjacently, despite there being sufficient slots/memory remaining(ex. 3g.20gb and 4g.20gb). Please see NVIDIA MIG documentation for further details
  • +
  • Currently, MIG configuration is only available in the debug, debug-scaling, and preemptable queues. submissions to other queues will result in any MIG config files passed being silently ignored
  • +
  • Files which do not match the above syntax will be silently rejected, and any invalid configurations in properly formatted files will be silently ignored. Please test any changes to your configuration in an interactive job session before use
  • +
  • A basic validator script is available at /soft/pbs/mig_conf_validate.sh. It will check for simple errors in your config, and print the expected configuration. For example:
  • +
+
ascovel@polaris-login-02:~> /soft/pbs/mig_conf_validate.sh -h
+usage: mig_conf_validate.sh -c CONFIG_FILE
+ascovel@polaris-login-02:~> /soft/pbs/mig_conf_validate.sh -c ./polaris-mig/mig_config.json
+expected MIG configuration:
+GPU     GPU_INST   COMPUTE_INST
+-------------------------------
+0       7g.40gb    4c.7g.40gb
+0       7g.40gb    3c.7g.40gb
+1       7g.40gb    4c.7g.40gb
+1       7g.40gb    3c.7g.40gb
+2       2g.10gb    2g.10gb
+2       4g.20gb    2c.4g.20gb
+2       4g.20gb    2c.4g.20gb
+3       2g.10gb    2g.10gb
+3       4g.20gb    2c.4g.20gb
+3       4g.20gb    2c.4g.20gb
+ascovel@polaris-login-02:~>
+
+

Example use of MIG compute instances

+

The following example demonstrates the use of MIG compute instances via the CUDA_VISIBLE_DEVICES environment variable:

+
ascovel@polaris-login-02:~/polaris-mig> qsub -l mig_config=/home/ascovel/polaris-mig/mig_config.json -l select=1 -l walltime=60:00 -l filesystems=home:grand:swift -A Operations -q R639752 -k doe -I
+qsub: waiting for job 640002.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov to start
+qsub: job 640002.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov ready
+
+ascovel@x3209c0s19b0n0:~> cat ./polaris-mig/mig_config.json
+{
+  "group1": {
+    "gpus": [0,1],
+    "mig_enabled": true,
+    "instances": {"7g.40gb": ["4c.7g.40gb", "3c.7g.40gb"] }
+  },
+  "group2": {
+    "gpus": [2,3],
+    "mig_enabled": true,
+    "instances": {"4g.20gb": ["2c.4g.20gb", "2c.4g.20gb"], "2g.10gb": ["2g.10gb"] }
+  }
+}
+ascovel@x3209c0s19b0n0:~> nvidia-smi -L | grep -Po -e "MIG[0-9a-f\-]+"
+MIG-63aa1884-acb8-5880-a586-173f6506966c
+MIG-b86283ae-9953-514f-81df-99be7e0553a5
+MIG-79065f64-bdbb-53ff-89e3-9d35f270b208
+MIG-6dd56a9d-e362-567e-95b1-108afbcfc674
+MIG-76459138-79df-5d00-a11f-b0a2a747bd9e
+MIG-4d5c9fb3-b0e3-50e8-a60c-233104222611
+MIG-bdfeeb2d-7a50-5e39-b3c5-767838a0b7a3
+MIG-87a2c2f3-d008-51be-b64b-6adb56deb679
+MIG-3d4cdd8c-fc36-5ce9-9676-a6e46d4a6c86
+MIG-773e8e18-f62a-5250-af1e-9343c9286ce1
+ascovel@x3209c0s19b0n0:~> for mig in $( nvidia-smi -L | grep -Po -e "MIG[0-9a-f\-]+" ) ; do CUDA_VISIBLE_DEVICES=${mig} ./saxpy & done 2>/dev/null
+ascovel@x3209c0s19b0n0:~> nvidia-smi | tail -n 16
++-----------------------------------------------------------------------------+
+| Processes:                                                                  |
+|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
+|        ID   ID                                                   Usage      |
+|=============================================================================|
+|    0    0    0      17480      C   ./saxpy                          8413MiB |
+|    0    0    1      17481      C   ./saxpy                          8363MiB |
+|    1    0    0      17482      C   ./saxpy                          8413MiB |
+|    1    0    1      17483      C   ./saxpy                          8363MiB |
+|    2    1    0      17484      C   ./saxpy                          8313MiB |
+|    2    1    1      17485      C   ./saxpy                          8313MiB |
+|    2    5    0      17486      C   ./saxpy                          8313MiB |
+|    3    1    0      17487      C   ./saxpy                          8313MiB |
+|    3    1    1      17488      C   ./saxpy                          8313MiB |
+|    3    5    0      17489      C   ./saxpy                          8313MiB |
++-----------------------------------------------------------------------------+
+ascovel@x3209c0s19b0n0:~>
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/workflows/parsl/index.html b/polaris/workflows/parsl/index.html new file mode 100644 index 0000000000..c035be1ccf --- /dev/null +++ b/polaris/workflows/parsl/index.html @@ -0,0 +1,6895 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Parsl - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Parsl on Polaris

+
+

Parsl is a flexible and scalable parallel programming library for Python.

+

-- Parsl Documentation

+
+

For many applications, managing an ensemble of jobs into a workflow is a critical step that can easily become a performance bottleneck. Many tools exist to address this, of which parsl is just one. On this page, we'll highlight some of the key pieces of information about parsl that are relevant to Polaris. Parsl is also extensively documented, has a dedicated Slack Channel, and a large community of users and developers beyond ALCF. We encourage you to engage with the parsl community for support with parsl specific questions, and for Polaris-specific questions or problems, please contact support@alcf.anl.gov.

+

Getting Parsl on Polaris

+

You can install parsl building off of the conda modules. You have some flexibility in how you want to extend the conda module to include parsl, but here is an example way to do it:

+
# Load the Conda Module (needed everytime you use parsl)
+module load conda
+conda activate
+
+# Create a virtual env that uses the conda env as the system packages.
+# Only do the next line on initial set up:
+python -m venv --system-site-packages /path/to/your/virtualenv
+
+# Load the virtual env (every time):
+source /path/to/your/virtualenv/bin/activate
+
+# Install parsl (only once)
+pip install parsl
+
+

Using Parsl on Polaris

+

Parsl has a variety of possible configuration settings. As an example, we provide the configuration below that will run one task per GPU:

+
from parsl.config import Config
+
+# PBSPro is the right provider for Polaris:
+from parsl.providers import PBSProProvider
+# The high throughput executor is for scaling to HPC systems:
+from parsl.executors import HighThroughputExecutor
+# You can use the MPI launcher, but may want the Gnu Parallel launcher, see below
+from parsl.launchers import MpiExecLauncher, GnuParallelLauncher
+# address_by_interface is needed for the HighThroughputExecutor:
+from parsl.addresses import address_by_interface
+# For checkpointing:
+from parsl.utils import get_all_checkpoints
+
+# Adjust your user-specific options here:
+run_dir="/lus/grand/projects/yourproject/yourrundir/"
+
+user_opts = {
+    "worker_init":      f"source /path/to/your/virtualenv/bin/activate; cd {run_dir}", # load the environment where parsl is installed
+    "scheduler_options":"#PBS -l filesystems=home:eagle:grand" , # specify any PBS options here, like filesystems
+    "account":          "YOURPROJECT",
+    "queue":            "debug-scaling",
+    "walltime":         "1:00:00",
+    "nodes_per_block":  3, # think of a block as one job on polaris, so to run on the main queues, set this >= 10
+    "cpus_per_node":    32, # Up to 64 with multithreading
+    "available_accelerators": 4, # Each Polaris node has 4 GPUs, setting this ensures one worker per GPU
+    "cores_per_worker": 8, # this will set the number of cpu hardware threads per worker.  
+}
+
+checkpoints = get_all_checkpoints(run_dir)
+print("Found the following checkpoints: ", checkpoints)
+
+config = Config(
+        executors=[
+            HighThroughputExecutor(
+                label="htex",
+                heartbeat_period=15,
+                heartbeat_threshold=120,
+                worker_debug=True,
+                available_accelerators=user_opts["available_accelerators"], # if this is set, it will override other settings for max_workers if set
+                cores_per_worker=user_opts["cores_per_worker"],
+                address=address_by_interface("bond0"),
+                cpu_affinity="block-reverse",
+                prefetch_capacity=0,
+                start_method="spawn",  # Needed to avoid interactions between MPI and os.fork
+                provider=PBSProProvider(
+                    launcher=MpiExecLauncher(bind_cmd="--cpu-bind", overrides="--depth=64 --ppn 1"),
+                    # Which launcher to use?  Check out the note below for some details.  Try MPI first!
+                    # launcher=GnuParallelLauncher(),
+                    account=user_opts["account"],
+                    queue=user_opts["queue"],
+                    select_options="ngpus=4",
+                    # PBS directives (header lines): for array jobs pass '-J' option
+                    scheduler_options=user_opts["scheduler_options"],
+                    # Command to be run before starting a worker, such as:
+                    worker_init=user_opts["worker_init"],
+                    # number of compute nodes allocated for each block
+                    nodes_per_block=user_opts["nodes_per_block"],
+                    init_blocks=1,
+                    min_blocks=0,
+                    max_blocks=1, # Can increase more to have more parallel jobs
+                    cpus_per_node=user_opts["cpus_per_node"],
+                    walltime=user_opts["walltime"]
+                ),
+            ),
+        ],
+        checkpoint_files = checkpoints,
+        run_dir=run_dir,
+        checkpoint_mode = 'task_exit',
+        retries=2,
+        app_cache=True,
+)
+
+

Special notes for Polaris

+

On Polaris, there is a known bug where python applications launched with mpi and that use fork to spawn processes can sometimes have unexplaned hangs. For this reason, it is recommended to use start_method="spawn" on Polaris when using the MpiExecLauncher as is shown in the example config above. Alternatively, another solution is to use the GNUParallelLauncher which uses GNU Parallel to spawn processes. GNU Parallel can be loaded in your environment with the command module load gnu-parallel. Both of these approaches will circumvent the hang issue from using fork.

+

Updates

+

For parsl versions after July 2023, the address passed in the HighThroughputExecutor needs to be set to address = address_by_interface("bond0"). With parsl versions prior to July 2023, it was recommended to use address = address_by_hostname() on Polaris, but with later versions this will not work on Polaris (or any other machine).

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/polaris/workflows/smartsim/index.html b/polaris/workflows/smartsim/index.html new file mode 100644 index 0000000000..ec31d9bf9f --- /dev/null +++ b/polaris/workflows/smartsim/index.html @@ -0,0 +1,6854 @@ + + + + + + + + + + + + + + + + + + + + + + + + + SmartSim - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

SmartSim and SmartRedis

+

SmartSim is an open source tool developed by the Hewlett Packard Enterprise (HPE) designed to facilitate the integration of traditional HPC simulation applications with machine learning workflows. +There are two core components to SmartSim:

+
    +
  • Infrastructure library (IL)
      +
    • Provides API to start, stop and monitor HPC applications from Python
    • +
    • Interfaces with the scheduler launch jobs (PBSPro on Polaris and Cobalt on Theta/ThetaGPU)
    • +
    • Deploys a distributed in-memory database called the Orchestrator
    • +
    +
  • +
  • SmartRedis client library
      +
    • Provides clients that connect to the Orchestrator from Fortran, C, C++, Python code
    • +
    • The client API library enables data transfer to/from database and ability to load and run JIT-traced Python and ML runtimes acting on stored data
    • +
    +
  • +
+

For more resources on SmartSim, follow the links below:

+ +

Installation

+

SmartSim on Polaris can be installed creating a virtual environment based on the ML conda module +

module load conda/2023-10-04
+conda activate
+module load cmake
+module load gcc/11.2.0
+module load cudatoolkit-standalone/11.8.0
+python -m venv --clear /path/to/_ssim_env --system-site-packages
+source /path/to/_ssim_env/bin/activate
+pip install --upgrade pip
+
+Note that /path/to/ can either be a user's home or project directory.

+

To use SmartSim in the future, simply load the same modules and source the virtual environment.

+

Then set up the environment variables +

export SMARTSIM_REDISAI=1.2.7
+export CC=cc
+export CXX=CC
+export CUDA_DEPS_BASE=/soft/libraries
+export CUDA_VERSION_MAJOR=11
+export CUDNN_VERSION_MAJOR=8
+export CUDNN_VERSION_MINOR=6
+export CUDNN_VERSION_EXTRA=0.163
+export CUDNN_VERSION=$CUDNN_VERSION_MAJOR.$CUDNN_VERSION_MINOR.$CUDNN_VERSION_EXTRA
+export CUDNN_BASE=$CUDA_DEPS_BASE/cudnn/cudnn-$CUDA_VERSION_MAJOR-linux-x64-v$CUDNN_VERSION
+export CUDNN_LIBRARY=$CUDNN_BASE/lib/
+export CUDNN_INCLUDE_DIR=$CUDNN_BASE/include/
+export LD_LIBRARY_PATH=$CUDNN_LIBRARY:$LD_LIBRARY_PATH
+

+

Now, install SmartSim and the GPU backend +

git clone https://github.com/CrayLabs/SmartSim.git
+cd SmartSim
+pip install -e .
+export TORCH_PATH=$( python -c 'import torch;print(torch.utils.cmake_prefix_path)' )
+export TF_PATH=$( python -c 'import tensorflow;print("/".join(tensorflow.__file__.split("/")[:-1]))' )
+smart build -v --device gpu --torch_dir $TORCH_PATH --libtensorflow_dir $TF_PATH
+cd ..
+

+

Finally, install the SmartRedis library +

export LDFLAGS=-L/opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6
+git clone https://github.com/CrayLabs/SmartRedis.git
+cd SmartRedis
+pip install -e .
+make lib
+cd ..
+

+

Examples

+

You can find examples of in situ training and inference of ML models from an ongoing CFD simulation at the NekRs-ML repository. +The smartredis and onlineGNN branches have instructions on how to build and run the examples on Polaris.

+

The Fall 2023 ALCF User Hands-On Workshop repository also contains information on how to use SmartSim and NekRS-ML on Polaris, but note the instructions are specfic to the Fall of 2023.

+

Notes

+
    +
  • SmartSim workflows, such as online training, often require launching multiple MPI applications on the same set of nodes. On Polaris, the MPICH_OFI_CXI_PID_BASE=0 must be exported before the first call to mpiexec, and then incremented by 1 and re-exported before each successive call. This is done with the SmartSim API by adding env_vars={'MPICH_OFI_CXI_PID_BASE':str(0)} to the PalsMpiexecSettings() API.
  • +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/accounts/IT_Access_Agreement_for_ALCF.pdf b/policies/accounts/IT_Access_Agreement_for_ALCF.pdf new file mode 100644 index 0000000000..3a8fb9bdc8 Binary files /dev/null and b/policies/accounts/IT_Access_Agreement_for_ALCF.pdf differ diff --git a/policies/accounts/IT_Access_Agreement_for_ALCF_Addendum.pdf b/policies/accounts/IT_Access_Agreement_for_ALCF_Addendum.pdf new file mode 100644 index 0000000000..823816d08a Binary files /dev/null and b/policies/accounts/IT_Access_Agreement_for_ALCF_Addendum.pdf differ diff --git a/policies/accounts/account-sponsorship-retention-policy/index.html b/policies/accounts/account-sponsorship-retention-policy/index.html new file mode 100644 index 0000000000..4b7964c848 --- /dev/null +++ b/policies/accounts/account-sponsorship-retention-policy/index.html @@ -0,0 +1,6797 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Account Sponsorship & Retention Policy - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Account Sponsorship & Retention Policy

+

This page is designed to help you understand the different types of accounts that you will encounter at the ALCF. The policy outlined reviews the responsibilities of an account holder, an account sponsor, and those of a foreign national.

+

ALCF Account Types

+

Annual: This account applies to users who are not ALCF Regular Employees. The default renewal date (account deactivation date) for the account is a year from the day the account was requested. These accounts are renewed annually and must be approved by an ALCF Staff member or a Project PI (also known as the “approver”). Users are required to update their account information and agree to the Terms of Use each year. Users need to be a part of an active project for their account to be renewed.

+

Permanent: This account applies to individuals who are Regular Employees within the ALCF and CPS Divisions. If you hold this type of account, periodic renewal is not necessary.

+

Note: Foreign Nationals have a second date (apart from their account deactivation date) that controls their account access. Accounts held by foreign nationals require paperwork referred to as an ANL-593 (or just 593 for shorthand). This paperwork is also required for any on-site access, and also applies to computer accounts. DOE requirements state that the ALCF is to disable any account with expired 593 paperwork.

+

A notification system has been established that issues a warning notice to users when expiration approaches and requests action to ensure that accounts are not needlessly turned off. An approval from the project PI is required to renew ANL 593 for project members that are foreign nationals.

+

Your responsibilities as an account approver

+

If you approve any accounts, please take note of the following roles and responsibilities:

+

By approving someone for an account at the ALCF, you are accepting responsibility for the account applicant and confirming that this individual is who they claim to be and is thus entitled to work on our computers. Do not simply "rubber stamp" any account application that claims you as an account approver/project PI.

+

You are also responsible for approving account renewal requests. When an account is about to expire, we send a warning notification to the account holder. Among other things, the account holder is asked to contact the approver (the PI of any of the active projects the account holder is associated with) if they wish to renew their account. We cannot and will not extend someone's account without an approval. An important aspect of this process to note is that inaction will result in the account becoming deactivated on the expiration date.

+

You are also responsible for approving ANL 593 renewals requests. When an account’s 593 is about to expire, we send a warning notification to the account holder. Among other things, the account holder is asked to contact the approver (the PI of any of the active projects the account holder is associated with) if they wish to renew their 593. We cannot and will not extend someone's 593 without an approval. An important aspect of this process to note is that inaction will result in the account becoming deactivated at the expiration date.

+

Account Retention Policy

+

Accounts can exist in one of three states:

+
    +
  • Active: The active state is normal for an account.
  • +
  • Inactive: The inactive state occurs when an account expires, and the ability to use ALCF resources is removed by changing the active status of the account to inactive. All files continue to exist in the user's home directory. An account will remain in the inactive state for at least 90 days before moving to the next state.
  • +
  • Deleted: After 90 days, an inactive account will be deleted. This removes all references to the account from the system (except the accounts database), including any files and home directories.
  • +
+

Users with inactive or deleted accounts can request a reactivation visiting https://accounts.alcf.anl.gov and clicking on the “Reactivate An Account” link.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/accounts/accounts-policy/index.html b/policies/accounts/accounts-policy/index.html new file mode 100644 index 0000000000..33ec4b4bdb --- /dev/null +++ b/policies/accounts/accounts-policy/index.html @@ -0,0 +1,6695 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Accounts Policy - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Accounts Policy

+

All holders of user accounts must abide by all appropriate Argonne Leadership Computing Facility and Argonne National Laboratory computing usage policies. +The policy details are outlined in the following documents:

+ +

These are described at the time of the account request and include requirements such as using a sufficiently strong password, appropriate use of the system, and so on. Any user not following these requirements will have their account disabled.

+

Furthermore, ALCF resources are intended to be used as a computing resource for specific computational science or engineering work, not as a general-purpose computing system.

+

If someone is using the system extensively but not carrying out any computational activities, their account could be disabled.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/accounts/user-authentication-policy/index.html b/policies/accounts/user-authentication-policy/index.html new file mode 100644 index 0000000000..129bc100bd --- /dev/null +++ b/policies/accounts/user-authentication-policy/index.html @@ -0,0 +1,6840 @@ + + + + + + + + + + + + + + + + + + + + + + + + + User Authentication Policy - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+ +
+
+ + + +
+
+ + + + + + + +

User Authentication Policy

+

Users of the ALCF systems are required to use a SafeNet token (physical or mobile) one time password, multifactor authentication system.

+

This document explains the policies users must follow regarding SafeNet tokens for accessing the ALCF systems.

+

MultiFactor Authentication

+

"Authentication systems are frequently described by the authentication factors that they incorporate. The three factors often considered as the cornerstone of authentication are: Something you know (for example, a password); Something you have (for example, an ID badge or a cryptographic key); and Something you are (for example, a voice print or other biometric measurement)." -- NIST iTL Bulletin, Aug 2004

+

By the NIST guidelines for identification and authentication (NIST 800-53, Revision 3, Control IA-2), ALCF aims for a Moderate level of security controls. All production systems in ALCF require multifactor authentication for users with network and local (privileged and non-privileged accounts) using the SafeNet tokens.

+

Mobile and Physical Tokens

+

ALCF provides every user of the production resources a physical or mobile token called a SafeNet Token. This is named after the company that developed the key fob and mobile software (the organization is now called SafeNet). "Both tokens use AES-256 bit encryption to generate OTPs [One Time Passwords] comprised of digits, digits and letters or digits, letters and special characters..."

+

When you receive your physical token, it will be initialized, but it will have no access privileges until you have contacted us to verify your identity.

+

At the end of your account or project lifecycle, please return the token to the ALCF help desk:

+

ALCF Service Desk +Argonne National Laboratory +9700 South Cass Avenue +Building 240 +Argonne, IL 60439

+

Protect Your Passcode token

+

Your passcode token should be protected by you as carefully as your credit cards or house keys. If your token is lost, stolen, or damaged, please contact us immediately so that we can deactivate the token and prevent unauthorized access. Sharing of tokens is strictly forbidden. Please do not mark on the token or alter it in any way.

+

More information

+

[New User Guide] (http://www.alcf.anl.gov/user-guides/new-user-guide)

+

Using Passcode Tokens

+

References

+ + + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/alcf-acknowledgement-policy/index.html b/policies/alcf-acknowledgement-policy/index.html new file mode 100644 index 0000000000..2f823eaba2 --- /dev/null +++ b/policies/alcf-acknowledgement-policy/index.html @@ -0,0 +1,6808 @@ + + + + + + + + + + + + + + + + + + + + + + + + + ALCF Acknowledgement Policy - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

ALCF Acknowledgement Policy

+

As a U.S. Department of Energy user facility dedicated to the advancement of scientific discoveries, the Argonne Leadership Computing Facility (ALCF) provides unique computing resources and expertise to a user community that is bound by certain policies designed to acknowledge and promote the work of others as well as the resources used to accomplish this work.

+

The ALCF requests your continued compliance with the terms of your program or discretionary award, specifically with regard to acknowledgments in publications and presentations based on work done with ALCF resources. Also, please forward your accepted publication citations to pubs@alcf.anl.gov.

+

AI Testbeds Publication Guidance

+

To publish technical reports and research papers using the ALCF AI testbeds, we request you to provide us with a draft of your paper prior to submission by emailing a copy to us at support@alcf.anl.gov. We will work closely with the AI testbed vendors to provide feedback in a timely manner. We strongly recommend you engage us and the vendors early and often in this process to help us facilitate your research objectives.

+

For guidance on acknowledgements, please see the following sample policies:

+

ALCF Only Acknowledgement

+

Users, and ALCF staff scientists without direct project funding, should acknowledge the ALCF in all publications and presentations that speak to work performed on ALCF resources.

+

This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science-Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357.

+

INCITE/ALCF Acknowledgement

+

Users should acknowledge the ALCF in all publications and presentations that speak to INCITE work performed on ALCF resources.

+

An award for computer time was provided by the U.S. Department of Energy’s (DOE) Innovative and Novel Computational Impact on Theory and Experiment (INCITE) Program. This research used resources from the Argonne Leadership Computing Facility, a U.S. DOE Office of Science user facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-06CH11357.

+

INCITE/ALCF/OLCF Acknowledgement

+

Users should acknowledge the ALCF and OLCF in all publications and presentations that speak to INCITE work performed on ALCF and OLCF resources.

+

An award for computer time was provided by the U.S. Department of Energy’s (DOE) Innovative and Novel Computational Impact on Theory and Experiment (INCITE) Program. This research used supporting resources at the Argonne and the Oak Ridge Leadership Computing Facilities. The Argonne Leadership Computing Facility at Argonne National Laboratory is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-06CH11357. The Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC05-00OR22725.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/data-and-software-policies/data-policy/index.html b/policies/data-and-software-policies/data-policy/index.html new file mode 100644 index 0000000000..ff7efcd1ef --- /dev/null +++ b/policies/data-and-software-policies/data-policy/index.html @@ -0,0 +1,7140 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Data Policy - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Data Policy

+

ALCF Data Confidentiality

+

The Argonne Leadership Computing Facility (ALCF) network is an open-research network. Because our resources and networks are open to many users and cannot be protected at a partitioned level, we cannot guarantee complete security for any data that resides here. It is up to users to provide the security they need.

+

Data is not encrypted at rest. Data transferred via SSH (i.e., scp) is encrypted in transmission using SSH’s mechanisms (e.g., AES256, etc.). Data transferred via Globus ( GridFTP) isn't normally fully encrypted. The GridFTP control channel is encrypted, but the data channel by default is not (though the authentication processes for both channels are encrypted). If you need full encryption of the data stream, you need to explicitly select "encrypt transfer" in the "Transfer & Timer Options" in the Globus UI or use equivalent options in the CLI or transfer API if you're using those. More information here https://docs.globus.org/faq/security

+

The basic level of protection provided is UNIX file level permissions; it is the user's responsibility to ensure that file permissions and umasks are set to match their needs.

+

NOTE: The default permissions and umasks are group and world readable. For help determining or setting file permissions or umasks, or creating a UNIX group, contact support@alcf.anl.gov.

+

ALCF Staff with Root Privileges

+

ALCF resource administrators with root privileges are not constrained by the file permissions, and they have the capability to open and/or copy all files on the system. They can also assume a user’s identity on the system. There is no audit trail for access, touching, or moving data; however, ALCF staff does not view or modify project data unless directed by a PI or project member to help debug a problem. Data may be touched or accessed by the filesystem itself if data needs to be repaired or verified for integrity after a filesystem event (e.g., a fsck).

+

The ALCF resources are Federal resources and are the property of the United States Government. Any or all uses of this system and all files on this system may be intercepted, monitored, recorded, copied, audited, inspected, and disclosed to authorized site, Department of Energy, and law enforcement personnel, as well as authorized officials of other agencies, both domestic and foreign.

+

Administrators use elevated privileges for maintenance and system management. Following are instances where ALCF staff might look at your files: +- We maintain copies of all .error, .output, and Cobalt log files and may review them to determine if a job failure was due to user error or a system failure. +- If you request our assistance via any mechanism (for example, support ticket, direct personal email, in-person, etc.), be aware we may need to view your files using elevated privileges to aid us in resolving your issue.

+

Use of Proprietary/Licensed Software

+

All software used on ALCF computers must be appropriately acquired and used according to the appropriate licensing. Possession or use of illegally copied software is prohibited. Likewise, users shall not copy copyrighted software, except as permitted by the owner of the copyright. Currently, the use of export-controlled codes is prohibited.

+

Prohibited Data

+

The ALCF computer systems are operated as research systems and contain only data related to scientific research. Use of ALCF resources to store, manipulate, or remotely access any sensitive or national security information is prohibited unless documented and approved, by the PI and ALCF leadership.

+

This includes, but is not limited to, personally identifiable information (data that falls under the Privacy Act of 1974, 5 U.S.C. 552a), controlled unclassified information (CUI) to include unclassified controlled nuclear information (UCNI), naval nuclear propulsion information (NNPI), International Traffic in Arms Relations (ITAR), the design or development of nuclear, biological, or chemical weapons, or any weapons of mass destruction. The use of ALCF resources for personal or non-work-related activities is also prohibited.

+

Export Control

+

All principal investigators using ALCF resources and ALCF staff members working with project teams are responsible for knowing whether their project generates any of these prohibited data types or information that falls under Export Control. For questions, contact ALCF Support at support@alcf.anl.gov.

+

Data Storage Systems

+

Data stored for any length of time on ALCF resources should only be data directly related to work done on any of the ALCF leadership computing systems. Specific policies apply to the three types of data storage systems maintained at ALCF. Read these policies carefully and plan accordingly in terms of space, usage, and data protection.

+

Home File System Space

+

swift-home

+

The home file system (/home) is intended to hold your executable files, configuration files, etc. It is NOT meant to hold the output from your application runs (use the data/parallel file system for that purpose). The home file system space is generally moderate in size and is the best protected. Because of its size, backups are practical to accomplish. The system performs tape backups, enabling the recovery of files more than seven days old or recovery from a catastrophic disk failure. Users should email support@alcf.anl.gov if they need assistance. The table below indicates the capabilities and characteristics of each file system.

+

AI Testbed home

+

/home shared across the ALCF AI testbed systems, including the ai testbed's login and compute nodes, is different from mira-home. Default user quota on the ai testbed's home is 1 TB storage and 1,000,000 files. This space is backed up.

+

Team Project or Campaign File System

+

theta-fs0 and Grand

+

The team project/campaign file system is intended primarily for results output from your computational runs on the ALCF computing systems. This space is accessible to the team members of your project that have an ALCF account. Default storage quota is 1 TB. Consider this space intermediate-term storage. Once any active production and/or analysis is complete and you no longer need regular access to the data, archive it within the ALCF (explained below) or transfer it to your home institution or move it to Eagle to share it with the broader community (explained below).

+

This space has redundancy in the servers and storage but is so large that replication, snapshots, and backups are not practical. Theta-fs0 and Grand are Lustre global parallel file systems. All new projects will be given storage allocations on either Grand or Eagle. Continuing projects (renewals) will have access to theta-fs0. More information on Lustre File Striping Basics: Lustre File Striping Basics

+

Pullback Policy: Projects that do not use a minimum of 50% of their allocated space after 6 months will be subject to a quota limit reduction.

+

AI Testbed projects file system

+

The team project/campaign file system /projects mounted on AI Testbed's login and compute nodes is intended to facilitate project collaboration and is accessible to the team members of your project that have an ALCF account. /projects on the AI Testbed is different from /projects on Theta, ThetaGPU, and Cooley. Default group storage quota is 2 TB and 2,000,000 files. Please note that this space isn't backed up. Our policy is that data will be purged from disk 6 months after project completion.

+

Shared Community Project or Campaign File System (Eagle)

+

The file system Eagle, a Lustre global parallel file system, has community sharing-abilities and is useful for sharing the project/campaign data with the broader research community via Globus. This space does not have redundancy in the servers or storage and is so large that replication, snapshots, and backups are not practical. The table below indicates the capabilities and characteristics of each file system. Default storage quota on Eagle is 1 TB and the default period is 1 year. More information on Lustre File Striping Basics: Lustre File Striping Basics

+

Eagle Data Pullback Policy: +Projects that do not use a minimum of 50% of their allocated space after 6 months will be subject to a quota limit reduction.

+

Eagle Access Termination Policy: +Project endpoints that have exhibited no activity* for a period of 6 months will be disabled and the storage space will be reclaimed. Notification will be sent to the PI and project members 30 days prior to and the day of the action.

+

Activity is defined as, but not limited to:

+
    +
  • Creation of the Globus endpoint
  • +
  • Globus transfers to and from the endpoint
  • +
  • atime audits of data files indicating access
  • +
  • Other factors may include DOIs and citations referring to the project
  • +
+

Archive Space

+

The archive space is intended for offline storage of results you wish to retain but either have no immediate need to access or no room in your parallel file system space. Archiving capabilities are available via HPSS. The primary HPSS access is via HSI. HTAR is available, but its path length and file size limitations often cause it to fail. Globus Online and GridFTP are clients that can also be used with HPSS. Due to the possibility of data corruption or loss due to a bad tape, users can request dual writes for particularly critical data. Such requests will be handled on a case-by-case basis.

+

Data Storage Policies

+

Disk Capacity and Retention Policies

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
----/home/lus/theta-fs0 or /projects */lus/grand/projects or /grandlus/eagle/projects or /eagle
Default Quota 150 GB1 TB / 1 million files1 TB / 1 million files1 TB / 1 million files
Quota Enforcement 2hard/softhard/softhard/softhard/soft
Disk Redundancy 3dual paritydual paritydual paritydual parity
File Server Snapshots 6 (frequency/retained)nonenonenonenone
File Server Metadata Redundancyyesyesyesyes
File Server Metadata Replication 4yesyesyesyes
File Server Data Replication 5yesyesnono
Data Purged from Diskn/a6 months after project completion 86 months after project completion 8After 6 months of inactivity (see Eagle Access termination policy listed in the Eagle section above) 8
+

* /lus/theta-fs0 does not apply to Polaris

+

Tape Capacity and Retention Policies

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
----/home/lus/theta-fs0 or /projects */lus/grand/projects or /grandlus/eagle/projects or /eagle
Automatic Backup to Tape? 7yesyesnono
Archived to Tape Before Deleted from Disk? 9yesyesnono
+
    +
  1. While quotas are subject to negotiation on a case-by-case basis, disk space is a finite resource and projects must exercise good data management practices for their own sake and the sake of other users of the facility. With Lustre, it has become necessary to enforce file quotas as well, which are also negotiable.
  2. +
  3. “Hard quota enforcement” means a job will fail when writing output if you exceed the hard quota limit. "Soft quota enforcement" means you may exceed the soft quota limit (but never the higher hard quota value) for up to seven days. If you do not drop back below the soft quota limit within seven days, writes will begin to fail.
  4. +
  5. Hard drives are in redundancy groups of 10 disks (8 data + 2 parity). In other words, three out of 10 drives would have to fail before data loss occurred.
  6. +
  7. Metadata (i.e., information listing which blocks are part of which files) is written twice to two different storage arrays. Thus, even if an entire array were lost, the metadata would be preserved.
  8. +
  9. Refers to the fact that data (user output) is written twice with each block on two different storage arrays, so that even if an entire array were lost, the data would be preserved.
  10. +
  11. Snapshots are stored in your home directory (see Home File System Space for more info). If you accidentally delete the directory or need a previous version, use the cp command to copy the file back to your home directory.
  12. +
  13. “Yes” denotes that ALCF does regular backups without intervention from the user. In case of project data, data is backed up to tape after a stipulated period (see point 8 below) and is retained for 2 years (subject to change). In all other cases, user is responsible for archiving the data to HPSS or copying it to another facility as desired.
  14. +
  15. The project directory is available on disk for the stipulated period but project quotas are reduced immediately following project end date (except Eagle). Access to the directory will be removed after 90 days. Requests to restore/extend access or reset the quota are reviewed on a case-by-case basis.
  16. +
  17. Users who wish to retain data must archive or transfer their data elsewhere at the end of the project. Users need an active ALCF account to access archived data on HPSS. See Account Retention Policy for more information.
  18. +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/data-and-software-policies/software-policy/index.html b/policies/data-and-software-policies/software-policy/index.html new file mode 100644 index 0000000000..9cebbac5ac --- /dev/null +++ b/policies/data-and-software-policies/software-policy/index.html @@ -0,0 +1,6744 @@ + + + + + + + + + + + + + + + + + + + + + + + Software Policy - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

ALCF Resource Software Use

+

All software used on ALCF computers must be appropriately acquired and used according to the appropriate licensing. Possession or use of illegally copied software is prohibited. Likewise, users shall not copy copyrighted software, except as permitted by the owner of the copyright. Currently, the use of export-controlled codes is prohibited.

+

Community Software Policy

+

ALCF supports the deployment of community software from active projects on production systems. A project may provide and support a code on ALCF systems for the ALCF user community as described in the [Community Software Service].

+

User deployments are system-specific, and their maintenance is the sole responsibility of the project deploying it. There shall be no expectation of additional support from ALCF, other than for the provisioning of space and integration with the module system. Projects will be provided with an initial module file from a template, with the expectation that they will update and maintain the module, providing paths and instructions so that user communities can access the software.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/facility-policies/index.html b/policies/facility-policies/index.html new file mode 100644 index 0000000000..2ead4810cc --- /dev/null +++ b/policies/facility-policies/index.html @@ -0,0 +1,6842 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Overview of Policies - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

ALCF Facility Policies

+

Be sure to familiarize yourself with the various policies and procedures for ALCF users, categorized below.

+

Accounts

+

All holders of user accounts must comply with ALCF and Argonne National Laboratory computing usage policies, including meeting certain security requirements and executing specific science- or engineering-related computing jobs.

+ +

ALCF Acknowledgement Policy

+

As a U.S. Department of Energy Office of Science User Facility dedicated to the advancement of scientific discovery, the ALCF requests that its users acknowledge and promote the work of others and the resources with which this work was accomplished.

+ +

Data and Allocation

+

These policies detail data and software usage, as well as pullback and refunds of computing hours.

+ +

Quarterly Reports

+

The ALCF is required to report the progress and accomplishments of its allocation projects. Policies are detailed by award type.

+ +

Queue and Scheduling Policies

+ + + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/queue-scheduling/pullback-policy/index.html b/policies/queue-scheduling/pullback-policy/index.html new file mode 100644 index 0000000000..99e4c8b788 --- /dev/null +++ b/policies/queue-scheduling/pullback-policy/index.html @@ -0,0 +1,6772 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Pullback Policy - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Pullback Policy

+

In an effort to ensure that valuable ALCF computing resources are used judiciously, a pullback policy has been instituted. Projects granted allocations under the INCITE and ALCC programs that have not used a significant amount of their allocation will be evaluated and adjusted during the year following the policies outlined on this page.

+

The figures outlined below represent the maximum amount that will be pulled back from projects after specific dates during the allocation period. The decision to reduce allocations will be made on a case-by-case basis in discussion with the project's primary investigators (PIs).

+

INCITE Pullback Policy

+

On May 1 of the current INCITE calendar year: +- if usage is less than 15% remove up to 15% of the unused balance +- if usage is less than 10% remove up to 30% of the unused balance

+

On September 1 of the current INCITE calendar year: +- if usage is less than 50% remove up to 33% of the unused balance +- if usage is less than 33% remove up to 50% of the unused balance +- if usage is less than 10% remove up to 75% of the unused balance

+

ALCC Pullback Policy

+

ALCC projects must use 50% of their allocation within the first seven months of the allocation cycle. Any unused time in excess of 50% will be deducted from the project allocation at the end of the seven month period.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/queue-scheduling/queue-and-scheduling-policy/index.html b/policies/queue-scheduling/queue-and-scheduling-policy/index.html new file mode 100644 index 0000000000..ca08b74073 --- /dev/null +++ b/policies/queue-scheduling/queue-and-scheduling-policy/index.html @@ -0,0 +1,6851 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Queue and Scheduling Policy - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+ +
+ + + +
+
+ + + + + + + +

Queue and Scheduling Policy

+

General Policy

+

We ask that all users follow good etiquette and be excellent to one another.

+

Priority

+

As with all Argonne Leadership Computing Facility production systems, job priority in the queue is based on several criteria: +- positive balance of your project +- size (in nodes) of the job, larger jobs receive higher priority +- the type of project (e.g. INCITE, ALCC, or discretionary) +- job duration - shorter duration jobs will accumulate priority more quickly, so it is best to specify the job run time as accurately as possible

+

Reservations and Scheduling Policy

+

Some work will require use of Theta that requires deviation from regular policy. On such occasions, normal reservation policy applies. Please send the regular form no fewer than five (5) business days in advance.

+

Big Run Mondays

+

As part of our regular maintenance procedures on Mondays, we will promote to the highest priority any jobs in the queued state requesting 802 nodes or more. Promotion is subject to operational discretion.

+

We may also, at our discretion, take the opportunity to promote the priority of capability jobs if the system has been drained of jobs for any other reason.

+

Monday Maintenance

+

On Mondays where the ALCF is on a regular business schedule the system may be expected to undergo maintenance from 9:00 am until 5:00 pm US Central Time. The showres command may be used to view pending and active maintenance reservations.

+

INCITE/ALCC Overburn Policy

+

If an INCITE or ALCC project has exhausted its allocation in the first 11 months of its allocation year, it is eligible for overburn running. At this point, capability jobs submitted by INCITE and ALCC projects will run in the default queue (instead of backfill) for the first 11 months of the allocation year until 125% of the project allocation has been consumed.

+

INCITE and ALCC projects needing additional overburn hours should e-mail support@alcf.anl.gov with a short description of what they plan to do with the additional hours, highlighting specific goals or milestones and the time expected to accomplish them. This will be reviewed by the scheduling committee, allocations committee, and ALCF management. Requests should be submitted 15 days before the start of the next quarter of the allocation year for full consideration. Non-capability jobs from projects that have exhausted their allocation will continue to run in backfill.

+

To be clear, this policy does not constitute a guarantee of extra time, and we reserve the right to prioritize the scheduling of jobs submitted by projects that have not yet used 100% of their allocations, so the earlier that an INCITE or ALCC project exhausts its allocation, the more likely it is to be able to take full advantage of this policy.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/policies/queue-scheduling/refund-policy/index.html b/policies/queue-scheduling/refund-policy/index.html new file mode 100644 index 0000000000..55d02606b1 --- /dev/null +++ b/policies/queue-scheduling/refund-policy/index.html @@ -0,0 +1,6693 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Refund Policy - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Refund Policy

+

If a system problem affects your run, ALCF will consider a refund of node hours. The ALCF expects all applications to regularly checkpoint, so refunds are typically capped at four hours of runtime for the affected job, unless the problem in question prevented checkpoints.

+

ALCF strongly advises against symlinking between filesystems or hard-coding paths to a different filesytem.

+

To request a refund, send the following information to support@alcf.anl.gov: +- Job id +- Machine +- Reason for refund request

+

For more information, contact support@alcf.anl.gov.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/running-jobs/example-job-scripts/index.html b/running-jobs/example-job-scripts/index.html new file mode 100644 index 0000000000..deb81c1da6 --- /dev/null +++ b/running-jobs/example-job-scripts/index.html @@ -0,0 +1,7349 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Example Job Scripts - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Example Job Scripts

+

This page contains a small collection of example job scripts users may find useful for submitting their jobs on Polaris. Additional information on PBS and how to submit these job scripts is available here.

+

A simple example using a similar script on Polaris is available in the +Getting Started Repo.

+

CPU MPI-OpenMP Examples

+

The following submit.sh example submits a 1-node job to Polaris with 16 MPI ranks per node and 2 OpenMP threads per rank. See Queues for details on practical limits to node counts and job times for different sizes of jobs.

+

The hello_affinity program is a compiled C++ code, which is built via make -f Makefile.nvhpc in the linked directory after cloning the Getting Started repository.

+
#!/bin/bash -l
+#PBS -l select=1:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -l filesystems=home:grand
+#PBS -q debug
+#PBS -A Catalyst
+
+# Change to working directory
+cd ${PBS_O_WORKDIR}
+
+# MPI and OpenMP settings
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS_PER_NODE=16
+NDEPTH=2
+NTHREADS=2
+
+NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
+echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
+
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./hello_affinity
+
+

The following function in the hello_affinity source code is essential for uniquely identifying the CUDA device even when Multi-Instance GPU (MIG) is enabled, as each physical device will be partitioned into multiple virtual devices, each with unique UUIDs differentiated by the last few characters:

+ + + + +
+
+
+
+
+
//https://stackoverflow.com/questions/68823023/set-cuda-device-by-uuid
+void uuid_print(cudaUUID_t a){
+  std::cout << "GPU";
+  std::vector<std::tuple<int, int> > r = {{0,4}, {4,6}, {6,8}, {8,10}, {10,16}};
+  for (auto t : r){
+    std::cout << "-";
+    for (int i = std::get<0>(t); i < std::get<1>(t); i++)
+      std::cout << std::hex << std::setfill('0') << std::setw(2) << (unsigned)(unsigned char)a.bytes[i];
+  }
+  std::cout << std::endl;
+}
+
+ + + + +

NOTE: If you are a zsh user, you will need to ensure ALL submission and shell scripts include the -l flag following #!/bin/bash as seen in the example above to ensure your environment is being instantiated properly. zsh is NOT supported by HPE and support from ALCF will be best effort only.

+

Each Polaris compute node has 1 Milan CPU with a total of 32 physical cores, with each core supporting 2 hardware threads (for a total of 64 logical cores).

+

The process affinity in this example is setup to map each MPI rank to 2 physical cores. Each MPI rank spawns 2 OpenMP threads, so 1 thread per physical core. The OpenMP settings bind each OpenMP thread to a single hardware thread within a core, such that all 32 physical cores are utilized. CPU core IDs 32 to 63 are not mapped to any MPI rank, since they correspond to simultaneous multithreaded (SMT) sibling hardware threads that share the execution resources of the core ids 0 to 31, respectively.

+
    +
  • cd ${PBS_O_WORKDIR} : change into the working directory from where qsub was executed.
  • +
  • NNODES= `wc -l < $PBS_NODEFILE`: one method for determine the total number of nodes allocated to a job.
  • +
  • NRANKS_PER_NODE=16 : This is a helper variable to set the number of MPI ranks for each node to 16.
  • +
  • NDEPTH=2 : This is a helper variable to space MPI ranks 2 "slots" from each other. In this example, individual threads correspond to a slot. This will be used together with the --cpu-bind option from mpiexec and additional binding options are available (e.g. numa, socket, core, etc.).
  • +
  • NTHREADS=2 : This is a helper variable to set the number of OpenMP threads per MPI rank.
  • +
  • NTOTRANKS=$(( NNODES * NRANKS_PER_NODE)) : This is a helper variable calculating the total number of MPI ranks spanning all nodes in the job.
  • +
+

Information on the use of mpiexec is available via man mpiexec. Some notes on the specific options used in the above example follow.

+
    +
  • -n ${NTOTRANKS} : This is specifying the total number of MPI ranks to start.
  • +
  • --ppn ${NRANKS_PER_NODE} : This is specifying the number of MPI ranks to start on each node.
  • +
  • --depth=${NDEPTH} : This is specifying how many cores/threads to space MPI ranks apart on each node.
  • +
  • --cpu bind depth : This is indicating the number of cores/threads will be bound to MPI ranks based on the depth argument.
  • +
  • --env OMP_NUM_THREADS=${NTHREADS} : This is setting the environment variable OMP_NUM_THREADS : to determine the number of OpenMP threads per MPI rank.
  • +
  • --env OMP_PLACES=threads : This is indicating how OpenMP should distribute threads across the resource, in this case across hardware threads.
  • +
+

Hardware threads

+

This example is similar to the previous, but it exhausts all 64 logical cores available on each compute node CPU. We double the number of MPI ranks to 32, one per each physical core. Using --cpu-bind=core, the --depth flag value becomes interpreted by Cray MPICH as spacing in number of physical cores, so NDEPTH=1 ensures that rank 0 is bound to CPU core IDs (0,32), the 2 SMT sibling hardware threads that share the first physical core.

+ + +

#!/bin/bash -l
+#PBS -l select=1:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -l filesystems=home:grand
+#PBS -q debug
+#PBS -A Catalyst
+
+# Change to working directory
+cd ${PBS_O_WORKDIR}
+
+# MPI and OpenMP settings
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS_PER_NODE=32
+NDEPTH=1
+NTHREADS=2
+
+NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
+echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
+
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind core --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./hello_affinity
+
+Many HPC applications do not benefit from utilizing the CPU's SMT2 capabilities, and such software may achieve better performance by using the previous script such that each of the 32 physical cores only runs a single OpenMP thread.

+

GPU MPI Examples

+

Using the CPU job submission examples above as a baseline, there are not many additional changes needed to enable an application to make use of the 4 NVIDIA A100 GPUs on each Polaris node. In the following 2-node example (because #PBS -l select=2 indicates the number of nodes requested), 4 MPI ranks will be started on each node assigning 1 MPI rank to each GPU in a round-robin fashion. A simple example using a similar job submission script on Polaris is available in the Getting Started Repo.

+
#!/bin/bash -l
+#PBS -l select=2:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -l filesystems=home:eagle
+#PBS -j oe
+#PBS -q debug
+#PBS -A Catalyst
+
+# Enable GPU-MPI (if supported by application)
+export MPICH_GPU_SUPPORT_ENABLED=1
+
+# Change to working directory
+cd ${PBS_O_WORKDIR}
+
+# MPI and OpenMP settings
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS_PER_NODE=$(nvidia-smi -L | wc -l)
+NDEPTH=8
+NTHREADS=1
+
+NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
+echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
+
+# For applications that internally handle binding MPI/OpenMP processes to GPUs
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./hello_affinity
+
+# For applications that need mpiexec to bind MPI ranks to GPUs
+#mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./set_affinity_gpu_polaris.sh ./hello_affinity
+
+

The affinity options NDEPTH=8; and --cpu-bind depth or core are set to ensure that +each MPI rank is bound to a separate NUMA node. If OpenMP threading is desired, set +NTHREADS=8 for each MPI rank to spawn 1 thread per physical core (all in the same NUMA +domain that the rank is bound to). The OpenMP-related options are not needed if your +application does not use OpenMP. Nothing additional is required on the mpiexec command +for applications that internally manage GPU devices and handle the binding of MPI/OpenMP +processes to GPUs. A small helper script is available for those with applications that +rely on MPI to handle the binding of MPI ranks to GPUs. Some notes on this helper script +and other key differences with the early CPU example follow.

+ + +
+

export MPICH_GPU_SUPPORT_ENABLED=1

+

For applications that support GPU-enabled MPI (i.e. use MPI to communicate +data directly between GPUs), this environment variable is required to +enable GPU support in Cray's MPICH. Omitting this will result in a +segfault. Support for this also requires that the application was linked +against the the GPU Transport Layer library (e.g. -lmpi_gtl_cuda), which is +automatically included for users by the craype-accel-nvidia80 module in +the default environment on Polaris. If this gtl library is not properly +linked, then users will see a error message indicating that upon executing +the first MPI command that uses a device pointer.

+
+
+

./set_affinity_gpu_polaris.sh

+

This script is useful for those applications that rely on MPI to bind MPI +ranks to GPUs on each node. Such a script is not necessary when the +application handles process-gpu binding. This script simply sets the +environment variable CUDA_VISIBLE_DEVICES to a restricted set of GPUs +(e.g. each MPI rank sees only one GPU). Otherwise, users would find that +all MPI ranks on a node will target the first GPU likely having a negative +impact on performance. An example for this script is available in the +Getting Started +repo +and copied below.

+
+

Hardware threads

+
#!/bin/bash -l
+#PBS -l select=2:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -l filesystems=home:eagle
+#PBS -q debug
+#PBS -A Catalyst
+
+# Enable GPU-MPI (if supported by application)
+export MPICH_GPU_SUPPORT_ENABLED=1
+
+# Change to working directory
+cd ${PBS_O_WORKDIR}
+
+# MPI and OpenMP settings
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS_PER_NODE=$(nvidia-smi -L | wc -l)
+NDEPTH=16
+NTHREADS=16
+
+NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
+echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
+
+# For applications that internally handle binding MPI/OpenMP processes to GPUs
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind numa --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./hello_affinity
+
+# For applications that need mpiexec to bind MPI ranks to GPUs
+#mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind numa --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./set_affinity_gpu_polaris.sh ./hello_affinity
+
+

As in the previous hardware threads example, the MPI ranks are spaced apart assuming the +user wants to utilize all 64 logical cores (achieved by setting NTHREADS=$NDEPTH=16 and +--cpu-bind numa here).

+

In this script, we have added -j oe to the list of PBS options; -j oe combines stdout +and stderr to the same file and uses the stdout filename provided (if provided). -j eo +would do the same but use the stderr filename provided. Without these options, separate +files containing stdout and stderr of the job are produced.

+

Here we compare two bare-bones PBS submission scripts for a CUDA example with and without MPI:

+
+
+
+
#!/bin/bash
+#PBS -l select=1
+#PBS -l walltime=00:10:00
+#PBS -q debug
+#PBS -l filesystems=home
+#PBS -A <project-name>
+#PBS -o logs/
+#PBS -e logs/
+
+module load cudatoolkit-standalone/11.8.0
+
+$HOME/ALCFBeginnersGuide/polaris/examples/01_example_cu
+
+
+
+
#!/bin/bash
+#PBS -l select=2
+#PBS -l walltime=00:10:00
+#PBS -q debug
+#PBS -l filesystems=home
+#PBS -A <project-name>
+#PBS -o logs/
+#PBS -e logs/
+
+module load cudatoolkit-standalone/11.8.0
+
+# Count number of nodes assigned
+NNODES=`wc -l < $PBS_NODEFILE`
+# set 1 MPI rank per GPU
+NRANKS_PER_NODE=4
+# calculate total ranks
+NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
+echo NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE}
+
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} $HOME/ALCFBeginnersGuide/polaris/examples/01_example_mpi
+
+
+
+
+

Setting GPU affinity for each MPI rank

+

The CUDA_VISIBLE_DEVICES environment variable is provided for users to set which GPUs on a node are accessible to an application or MPI ranks started on a node.

+

A copy of the small helper script provided in the Getting Started repo is provided below for reference:

+
+
+
+
#!/bin/bash -l
+num_gpus=4
+# need to assign GPUs in reverse order due to topology
+# See Polaris Device Affinity Information https://www.alcf.anl.gov/support/user-guides/polaris/hardware-overview/machine-overview/index.html
+gpu=$((${num_gpus} - 1 - ${PMI_LOCAL_RANK} % ${num_gpus}))
+export CUDA_VISIBLE_DEVICES=$gpu
+echo “RANK= ${PMI_RANK} LOCAL_RANK= ${PMI_LOCAL_RANK} gpu= ${gpu}”
+exec "$@"
+
+
+
+
+
+

Note

+

The echo command prints a helpful message for the user to confirm the +desired mapping is achieved. Users are encouraged to edit this file as +necessary for their particular needs.

+
+
+

Warning

+

If planning large-scale runs with many thousands of MPI ranks, it is +advised to comment out the echo command above so as not to have thousands +of lines of output written to stdout.

+
+

Using MPS on the GPUs

+

Documentation for the NVIDIA Multi-Process Service (MPS) can be found here

+

In the script below, note that if you are going to run this as a multi-node job you will need to do this on every compute node, and you will need to ensure that the paths you specify for CUDA_MPS_PIPE_DIRECTORY and CUDA_MPS_LOG_DIRECTORY do not "collide" and end up with all the nodes writing to the same place.

+

An example is available in the Getting Started Repo and discussed below. +The local SSDs or /dev/shm or incorporation of the node name into the path would all be possible ways of dealing with that issue.

+
#!/bin/bash -l
+export CUDA_MPS_PIPE_DIRECTORY=</path/writeable/by/you>
+export CUDA_MPS_LOG_DIRECTORY=</path/writeable/by/you>
+CUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-cuda-mps-control -d
+echo "start_server -uid $( id -u )" | nvidia-cuda-mps-control
+
+

to verify the control service is running:

+
$ nvidia-smi | grep -B1 -A15 Processes
+
+

and the output should look similar to this:

+
+-----------------------------------------------------------------------------+
+| Processes:                                                                  |
+|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
+|        ID   ID                                                   Usage      |
+|=============================================================================|
+|    0   N/A  N/A     58874      C   nvidia-cuda-mps-server             27MiB |
+|    1   N/A  N/A     58874      C   nvidia-cuda-mps-server             27MiB |
+|    2   N/A  N/A     58874      C   nvidia-cuda-mps-server             27MiB |
+|    3   N/A  N/A     58874      C   nvidia-cuda-mps-server             27MiB |
++-----------------------------------------------------------------------------+
+
+

to shut down the service:

+

echo "quit" | nvidia-cuda-mps-control

+

to verify the service shut down properly:

+

nvidia-smi | grep -B1 -A15 Processes

+

and the output should look like this:

+
+-----------------------------------------------------------------------------+
+| Processes:                                                                  |
+|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
+|        ID   ID                                                   Usage      |
+|=============================================================================|
+|  No running processes found                                                 |
++-----------------------------------------------------------------------------+
+
+

Using MPS in Multi-node Jobs

+

As stated earlier, it is important to start the MPS control service on each node in a job that requires it. An example is available in the Getting Started Repo. The helper script enable_mps_polaris.sh can be used to start the MPS on a node.

+

#!/bin/bash -l
+
+export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
+export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log
+CUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-cuda-mps-control -d
+echo "start_server -uid $( id -u )" | nvidia-cuda-mps-control
+
+The helper script disable_mps_polaris.sh can be used to disable MPS at appropriate points during a job script, if needed.

+

#!/bin/bash -l
+
+echo quit | nvidia-cuda-mps-control
+
+In the example job script submit.sh below, MPS is first enabled on all nodes in the job using mpiexec -n ${NNODES} --ppn 1 to launch the enablement script using a single MPI rank on each compute node. The application is then run as normally. If desired, a similar one-rank-per-node mpiexec command can be used to disable MPS on all the nodes in a job.

+
#!/bin/bash -l
+#PBS -l select=1:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -q debug
+#PBS -A Catalyst
+#PBS -l filesystems=home:grand:eagle
+
+cd ${PBS_O_WORKDIR}
+
+# MPI example w/ 8 MPI ranks per node spread evenly across cores
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS_PER_NODE=8
+NDEPTH=8
+NTHREADS=1
+
+NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
+echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
+
+# Enable MPS on each node allocated to job
+mpiexec -n ${NNODES} --ppn 1 ./enable_mps_polaris.sh
+
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth ./hello_affinity
+
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth ./set_affinity_gpu_polaris.sh ./hello_affinity
+
+# Disable MPS on each node allocated to job
+mpiexec -n ${NNODES} --ppn 1 ./disable_mps_polaris.sh
+
+

Single-node Ensemble Calculations Example

+

In the script below, a set of four applications are launched simultaneously on a single node. +Each application runs on 8 MPI ranks and targets a specific GPU using the CUDA_VISIBLE_DEVICES environment variable. +In the first instance, MPI ranks 0-7 will spawn on CPUs 24-31, and GPU 0 is used. +This pairing of CPUs and GPU is based on output of the nvidia-smi topo-m command showing which CPUs share a NUMA domain with each GPU. +It is important to background processes using & and to wait for all runs to complete before exiting the script or continuing on with additional work. +Note, multiple applications can run on the same set of CPU resources, but it may not be optimal depending on the workload. +An example is available in the Getting Started Repo.

+
#!/bin/bash -l
+#PBS -l select=1:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -q debug
+#PBS -A Catalyst
+#PBS -l filesystems=home:grand:eagle
+
+#cd ${PBS_O_WORKDIR}
+
+# MPI example w/ 8 MPI ranks per node spread evenly across cores
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS_PER_NODE=8
+NTHREADS=1
+
+nvidia-smi topo -m
+
+NTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))
+echo "NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
+
+export CUDA_VISIBLE_DEVICES=0
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --cpu-bind list:24:25:26:27:28:29:30:31 ./hello_affinity &
+
+export CUDA_VISIBLE_DEVICES=1
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --cpu-bind list:16:17:18:19:20:21:22:23 ./hello_affinity &
+
+export CUDA_VISIBLE_DEVICES=2
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --cpu-bind list:8:9:10:11:12:13:14:15 ./hello_affinity &
+
+export CUDA_VISIBLE_DEVICES=3
+mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --cpu-bind list:0:1:2:3:4:5:6:7 ./hello_affinity &
+
+wait
+
+

Multi-node Ensemble Calculations Example

+

To run multiple concurrent applications on distinct sets of nodes, one simply needs to provide appropriate hostfiles to the mpiexec command. The split unix command is one convenient way to create several unique hostfiles, each containing a subset of nodes available to the job. In the 8-node example below, a total of four applications will be launched on separate sets of nodes. The $PBS_NODEFILE file will be split into several hostfiles, each containing two lines (nodes). These smaller hostfiles are then used as the argument to the --hostfile argument of mpiexec to the launch applications. It is important to background processes using & and to wait for applications to finish running before leaving the script or continuing on with additional work. Note, multiple applications can run on the same set of CPU resources, but it may not be optimal depending on the workload. An example is available in the Getting Started Repo.

+
#!/bin/bash -l
+#PBS -l select=8:system=polaris
+#PBS -l place=scatter
+#PBS -l walltime=0:30:00
+#PBS -q debug-scaling
+#PBS -A Catalyst
+#PBS -l filesystems=home:grand:eagle
+
+cd ${PBS_O_WORKDIR}
+
+# MPI example w/ multiple runs per batch job
+NNODES=`wc -l < $PBS_NODEFILE`
+
+# Settings for each run: 2 nodes, 4 MPI ranks per node spread evenly across cores
+# User must ensure there are enough nodes in job to support all concurrent runs
+NUM_NODES_PER_MPI=2
+NRANKS_PER_NODE=4
+NDEPTH=8
+NTHREADS=1
+
+NTOTRANKS=$(( NUM_NODES_PER_MPI * NRANKS_PER_NODE ))
+echo "NUM_OF_NODES= ${NNODES} NUM_NODES_PER_MPI= ${NUM_NODES_PER_MPI} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}"
+
+# Increase value of suffix-length if more than 99 jobs
+split --lines=${NUM_NODES_PER_MPI} --numeric-suffixes=1 --suffix-length=2 $PBS_NODEFILE local_hostfile.
+
+for lh in local_hostfile*
+do
+  echo "Launching mpiexec w/ ${lh}"
+  mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --hostfile ${lh} --depth=${NDEPTH} --cpu-bind depth ./hello_affinity &
+  sleep 1s
+done
+
+wait
+
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/running-jobs/job-and-queue-scheduling/index.html b/running-jobs/job-and-queue-scheduling/index.html new file mode 100644 index 0000000000..bb48313016 --- /dev/null +++ b/running-jobs/job-and-queue-scheduling/index.html @@ -0,0 +1,7653 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Job Scheduling and Execution - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

Running Jobs using PBS

+

Documentation / Tools

+
    +
  • The PBS "BigBook": This is really excellent. We highly suggest you download it and search through it when you have questions. However, it is big at about 2000 pages / 40MB and contains a bunch of stuff you don't really need, so you can also download the guides separately here: +
  • +
  • Cobalt qsub options to PBS qsub options: shows how to map cobalt command line options to PBS command line options. Can be found at the link above.
  • +
  • qsub2pbs: Installed on Theta and Cooley. Pass it a Cobalt command line and it will convert it to a PBS command line. Add the --directives option, and it will output an executable script. Note that it outputs -l select=system=None. You would need to change the None to whatever system you wanted to target (polaris, aurora, etc.).
  • +
+

Introduction

+

At a high level, getting computational tasks run on an HPC system is a two-step process:

+
    +
  1. +

    You request and get allocated resources (we allocate at the node level, but some facilities you request number of cores and RAM, etc.) on one or more of the systems. + This is accomplished by interacting with the job scheduler / workload manager. In the ALCF we use PBS Professional.

    +
  2. +
  3. +

    You execute your tasks on those resources. + This is accomplished in your job script by interacting with various system services (MPI, OpenMP, the HPE PALS task launch system, etc.)

    +
  4. +
+

Our documentation is organized in two sections aligned with the two steps described above.

+

Table of Contents

+ +

Obtaining and managing compute resources at ALCF

+

Definitions and Notes

+

chunk: A set of resources allocated as a unit to a job. Specified inside a selection directive. All parts of a chunk come from the same host. In a typical MPI (Message-Passing Interface) job, there is one chunk per MPI process.

+

vnode: A virtual node, or vnode, is an abstract object representing a host or a set of resources which form a usable part of an execution host. This could be an entire host, +or a nodeboard or a blade. A single host can be made up of multiple vnodes. Each vnode can be managed and scheduled independently. Each vnode in a complex must have a unique name. Vnodes on a host can share resources, such as node-locked licenses. PBS operates on vnodes. A vnode can, and in ALCF often will, represent an entire host, but it doesn't have to. For instance, there is a mode on Polaris where we could have each physical host look like four vnodes, each with 16 threads, 1/4 of the RAM and one A100.

+

ncpus: Number of resources available to execute a program. In ALCF, given the way we configure PBS, this equates to a hardware thread. For example, a single socket node with a 32 core CPU, each with two hardware threads would report that as ncpus=64.

+

ngpus: The number of allocable GPUs on the vnode. For an NVIDIA A100, this could be one, however, if we enable Multi Instance GPU (MIG) mode and use cgroups it could be as high as 7.

+

job: A job equates to a qsub. A set of resources allocated to you for a period of time. Your will execute one or more tasks on those resources during your job.

+

task: A single execution on the resources of your job, often an mpiexec invocation launched by PALS or PMIx. You may run one task or many tasks during your job. You may run tasks sequentially or divide your resources up and run several tasks concurrently. Also sometimes referred to as job steps.

+

Quick Start

+

If you are an ALCF user and are familiar with Cobalt, you will find the PBS commands very similar though the options to qsub are quite different. Here are the "Big Four" commands you will use:

+
    +
  1. qsub: request resources (generally compute nodes) to run your job and start your script/executable on the head node. Here is the minimal qsub allowed at the ALCF:
      +
    • qsub -A <project> -l select=<# of nodes>,walltime=HH:MM:SS,filesystems=fs1:fs2 <your job script>
    • +
    • The -A, walltime, and filesystems are mandatory. You will receive errors if they are not specified.
    • +
    • We automatically add -k doe for you. This streams your output back rather than spooling it and copying it back at the end of the job. It probably isn't a bad idea to specify it in your script, but we enforce that option, so if you try and change it, you will get an error.
    • +
    • It is highly likely you will also want to add -l place=scatter so that each of your chunks (<# of nodes>) gets its own vnode.
    • +
    • If you want to run an executable rather than a script replace <your jobs script> in the example above with -- <your executable> (that is dash dash)
    • +
    • PBS Documentation: Users Guide, Chapter 2, page UG-11 and Reference Guide Chapter 2, section 2.57, page RG-216
    • +
    +
  2. +
  3. qstat: check on the status of your jobs or queues
      +
    • Try these variations and see which you like best: qstat, qstat -was, qstat -was1, qstat -wan, qstat -wan1. Add -x to see jobs that have completed. We keep two weeks of history.
    • +
    • qstat -Q will list all the queues in case you forget.
    • +
    • PBS Documentation: Users Guide Sec. 10.2, page UG-175; Reference Guide Sec. 2.55, page RG-200
    • +
    +
  4. +
  5. qalter: update your request for resources
      +
    • Just like qsub, just add a jobid at the end. Only works before the job starts;
    • +
    • If you want to change the walltime to 30 minutes: qalter -l walltime=30:00:00 <jobid>
    • +
    • PBS Documentation: Users Guide Sec. 9.2, page UG-168; Reference Guide Sec. 2.40, page RG-130
    • +
    +
  6. +
  7. qdel: cancel a job that you don't need. This will also kill a running job
      +
    • qdel <jobid>
    • +
    • PBS Documentation: Users Guide Sec. 9.3, page UG-170; Reference Guide Sec. 2.41, page RG-143
    • +
    +
  8. +
+

Note: The page numbers in the PBS guides are unique. If you search for the specified page number it will take you directly to the relevant page.

+

qsub: submit a job to run

+

Users Guide, Chapter 2, page UG-11 and Reference Guide Chapter 2, section 2.57, page RG-216

+

At the ALCF, your qsub will likely use the following parameters:

+

qsub -A <project> -k doe -l select=<#>:system=<name>, walltime=HH:MM:SS, filesystems=fs1:fs2, place=scatter <your job script>

+

Where:

+
    +
  • project is the project name associated with your allocation. What you check the balance of with the sbank command. This is a mandatory option at the ALCF. If you don't include it you will get qsub: Account_Name is required to be set.
  • +
  • -k doe is telling pbs to stream your output rather than buffer it on the compute nodes and then scp it at the end of the job. Note we will automatically add this if you don't specify it. We enforce this option, so if you try and specify any other output handling you will get an error.
  • +
  • # of chunks (typically nodes). Each of our systems has a PBS "resource" called system defined and set to the system name (polaris, sunspot, etc)
  • +
  • walltime=HH:MM:SS specifying a wall time is mandatory at the ALCF. Valid wall times depend on the queue you are using. There is a table with the queues for each machine at the end of this section and in the machine specific documentation.
  • +
  • filesystems=fs1:fs2:... Specifying which filesystems your application uses is mandatory at ALCF. The reason for this is if a filesystem goes down, we have a way of making PBS aware of that and it won't run jobs that need that filesystem. If you don't specify filesystems you will receive the following error: qsub: Resource: filesystems is required to be set.
  • +
  • place=scatter is telling PBS you want each of your chunks on a separate vnode. By default, PBS will pack your chunks to get maximum utilization. If you requested ncpus=1 and chunks=64 without place=scatter on a system with ncpus=64, all your chunks would end up on one node.
  • +
  • Your job script: See Example Job Scripts for more information about how to build your job script. For options that wont change, you do have the option of taking things off the command line and putting them in your job script. For instance the above command line could be simplified to qsub -l select=<#> <your job script> if you added the following to the top (the PBS directives have to be before any executable line) of your job script:
  • +
+
#PBS -A <project>
+#PBS -k doe
+#PBS -l walltime=HH:MM:SS
+#PBS -l filesystems=fs1:fs2
+
+

Also note that if you want to run an executable directly rather than a script you use two dashes and the executable name in place of your script name like this: -- /usr/bin/sleep 600

+

More detail:

+

The single biggest difference between Cobalt and PBS is the way you select resources when submitting a job. In Cobalt, every system had its own Cobalt server and you just specified the number of nodes you wanted (-n). With PBS, we are planning on running a single "PBS Complex" which means there will be a single PBS server for all systems in the ALCF and you need to specify enough constraints to get your job to run on the resources you want/need. One advantage of this is that getting resources from two different systems or "co-scheduling" is trivially possible.

+

Resource Selection and Job Placement

+

Section 2.57.2.6 RG-219 Requesting Resources and Placing jobs in the Reference Guide.

+

Resources come in two flavors:

+
    +
  • Job Wide: Walltime is the most common example of a job wide resource. You use the -l option to specify job wide resources, i.e. -l walltime=06:00:00. All the resources in the job have the same walltime.
  • +
  • -l <resource name>=<value>[,<resource name>=<value> ...]
  • +
  • Chunks: (see the definition above) This is how you describe what your needs are to run your job. You do this with the -l select= syntax. In the ALCF, we do whole node scheduling and every node has a resource called system which is set to the system name it belongs to (Polaris, Aurora, etc). This means you can typically get away with the very simple -l select=128:system=foo which will give you 128 complete nodes on system foo.
  • +
  • -l select=[<N>:]<chunk>[+[<N>:]<chunk> ...] where N specifies how many of that chunk and a chunk is of the form:
  • +
  • <resource name>=<value>[:<resource name>=<value> ...]
  • +
  • Here is a hypothetical example that would select resources with A100s and other resources with A40 GPUs. PBS takes care of co-scheduling the nodes on the two systems for you transparently. Note that in this case since we did not specify system= if there were multiple systems that could satisfy a chunk you wouldn't know ahead of time which system you would get.
  • +
+

-l select=128:ncpus=64:ngpus=4:gputype=A100+32:ncpus=64:ngpus=2:gputype=A40

+

You also have to tell PBS how you want the chunks distributed across the physical hardware. You do that via the -l place option:

+
    +
  • -l place=[<arrangement>][: <sharing> ][: <grouping>] where
  • +
  • arrangement is one of free | pack | scatter | vscatter
      +
    • unless you have a specific reason to do otherwise, you probably want to set this to scatter, otherwise you may not get what you expect. For instance on a host with ncpus=64, if you requested -l select=8:ncpus=8 you could end up with all of our chunks on one node.
    • +
    • free means PBS can distribute them as it sees fit
    • +
    • pack means all chunks from one host. Note that this is not the minimum number of hosts, it is one host. If the chunks can't fit on one host, the qsub will fail.
    • +
    • scatter means take only one chunk from any given host.
    • +
    • vscatter means take only one chunk from any given vnode. If a host has multiple vnodes, you could end up with more than one chunk on the host.
    • +
    +
  • +
  • sharing is one of excl | shared | exclhost where
      +
    • NOTE: Node configuration can override your requested sharing mode. For instance, in most cases ALCF sets the nodes to force_exclhost, so normally you don't have to specify this.
    • +
    • excl means this job gets the entire vnode
    • +
    • shared means the vnode could be shared with another job from another user.
    • +
    • exclhost means this job gets the entire host, even if it has multiple vnodes.
    • +
    +
  • +
  • group=<resource name>
      +
    • As an example, for machines that use a dragonfly network topology, we provide a PBS resource named tier1 indicating which dragonfly group a node is in. If you wanted to ensure that all the chunks came from a single dragonfly group, you could specify place=group=tier1 as part of your qsub. tier0 is rack granularity, so group=tier0 would ensure your nodes all came from one rack. Note that if you requested more nodes than were available in a rack your job would never run and you would see something like Not Running: Insufficient amount of resource: tier0.
    • +
    +
  • +
+

We have defined placement sets for the tier0 and tier1 resources. As a result, if you don't specify a grouping PBS will preferentially group your nodes in a placement set, but it won't drain or delay your job start to do so. For example, if you request 10 nodes and don't specify a grouping, if 10 nodes are available in the same rack, all your nodes will be in one rack. If not, but there are 10 nodes in a single dragonfly group, all your nodes will be in one dragonfly group. If you wish to specify a specific rack or dragonfly group, that is accomplished via the select syntax. For instance, qsub ... -l select=10:tier1=g0 would force your 10 nodes to be in dragonfly group 0.

+

Here is a heavily commented sample PBS submission script that shows some more of the options, but remember that the PBS manuals referenced at the top of this page are the ultimate resource.

+
#!/bin/bash -l
+# UG Section 2.5, page UG-24 Job Submission Options
+# Add another # at the beginning of the line to comment out a line
+# NOTE: adding a switch to the command line will override values in this file.
+
+# These options are MANDATORY at ALCF; Your qsub will fail if you don't provide them.
+#PBS -A <short project name>
+#PBS -l walltime=HH:MM:SS
+#file systems used by the job
+#PBS -l filesystems=home:eagle
+
+
+# Highly recommended
+# The first 15 characters of the job name are displayed in the qstat output:
+#PBS -N <name>
+
+# If you need a queue other than the default, which is prod (uncomment to use)
+##PBS -q <queue name>
+
+# Controlling the output of your application
+# UG Sec 3.3 page UG-42 Managing Output and Error Files
+# By default, PBS spools your output on the compute node and then uses scp to move it the
+# destination directory after the job finishes.  Since we have globally mounted file systems
+# it is highly recommended that you use the -k option to write directly to the destination
+# the doe stands for direct, output, error
+#PBS -k doe
+#PBS -o <path for stdout>
+#PBS -e <path for stderr>
+
+# If you want to merge stdout and stderr, use the -j option
+# oe=merge stdout/stderr to stdout, eo=merge stderr/stdout to stderr, n=don't merge
+#PBS -j n
+
+# Controlling email notifications
+# UG Sec 2.5.1, page UG-25 Specifying Email Notification
+# When to send email b=job begin, e=job end, a=job abort, j=subjobs (job arrays), n=no mail
+#PBS -m be
+# Be default, mail goes to the submitter, use this option to add others (uncomment to use)
+#PBS -M <email addresses>
+
+# Setting job dependencies
+# UG Section 6.2, page UG-109 Using Job Dependencies
+# There are many options for how to set up dependencies;  afterok will give behavior similar
+# to Cobalt (uncomment to use)
+##PBS depend=afterok:<jobid>:<jobid>
+
+# Environment variables (uncomment to use)
+# UG Section 6.12, page UG-126 Using Environment Variables
+# RG Sect 2.57.7, page RG-233 Environment variables PBS puts in the job environment
+##PBS -v <variable list>
+## -v a=10, "var2='A,B'", c=20, HOME=/home/zzz
+##PBS -V exports all the environment variables in your environment to the compute node
+
+
+# The rest is an example of how an MPI job might be set up
+echo Working directory is $PBS_O_WORKDIR
+cd $PBS_O_WORKDIR
+
+echo Jobid: $PBS_JOBID
+echo Running on host `hostname`
+echo Running on nodes `cat $PBS_NODEFILE`
+
+NNODES=`wc -l < $PBS_NODEFILE`
+NRANKS=1           # Number of MPI ranks per node
+NDEPTH=1           # Number of hardware threads per rank, spacing between MPI ranks on a node
+NTHREADS=1         # Number of OMP threads per rank, given to OMP_NUM_THREADS
+
+NTOTRANKS=$(( NNODES * NRANKS ))
+
+echo "NUM_OF_NODES=${NNODES}  TOTAL_NUM_RANKS=${NTOTRANKS}  RANKS_PER_NODE=${NRANKS}  THREADS_PER_RANK=${NTHREADS}"
+
+mpiexec --np ${NTOTRANKS} -ppn ${NRANKS} -d ${NDEPTH} -env OMP_NUM_THREADS=${NTHREADS} ./hello_mpi
+
+

Email Notifications

+

Users should add -M <email address> if they want notifications as a best practice.

+

Note: For users with '@alcf.anl.gov' email addressed, PBS will send out an email once the job has ended by default. If you do not want to receive these notifications, you will need to add #PBS -m n to your script.

+

Specifying Filesystems

+

Note: The filesystems attribute is mandatory. If you do not specify a filesystem(s) you will receive the following error message upon submission:

+

qsub: Resource: filesystems is required to be set.

+

Valid filesystems are home, eagle, and grand. For example, to request the home and eagle filesystems for your job you would add -l filesystems=home:eagle to your qsub command.

+

If a job is submitted while a filesystem it requested is marked down, the job will be queued but will not run, with a message in the comment field of the job as to why it is not running. Run qstat -f <jobid> to see the comment field. For example, if the job requested for eagle and if Eagle is unavailable, the comment field will have Can Never Run: Insufficient amount of server resource: eagle_fs (True != False)). Once the affected filesystem has been returned to normal operation, and the filesystem is marked as being available, the job will then be scheduled normally. The job cannot run until all filesystems requested by the job are available.

+

If a job requesting a filesystem that is marked down is already in the queue, the job will be not run until all of its requested filesystems are available.

+

An example of a job requesting filesystems:

+

qsub -l select=10:ncpus=64,walltime=30:00,filesystems=grand:home -A ProjectX -q prod my_job.sh

+

To update the filesystems list for your job, use qalter.

+

qsub examples

+
    +
  • qsub -A my_allocation -l select=4:system=polaris -l filesystems=home:eagle -l walltime=30:00 -q debug-scaling -- a.out
      +
    • run a.out on 4 chunks on polaris with a walltime of 30 minutes in debug-scaling queue; charge my_allocation;
    • +
    • Since we allocate full nodes on Polaris, 4 chunks will be 4 nodes. If we shared nodes, that would be 4 threads.
    • +
    • use the -- (dash dash) syntax when directly running an executable.
    • +
    +
  • +
  • qsub -A my_allocation -l place=scatter -l filesystems=home:eagle -l select=32:ncpus=32 -q prod -l walltime=30:00 mpi_mm_64.sh
      +
    • 32 chunks on any system that meets the requirements. Each chunk must have 32 HW threads; place=scatter means use a different vnode for each chunk, even if you could fit more than one on a vnode. Use the queue named prod.
    • +
    +
  • +
+

qstat: Query the status of jobs/queues

+

Users Guide Sec. 10.2, page UG-175; Reference Guide Sec. 2.55, page RG-200

+

Jobs

+

At it's most basic, you just type qstat and it will list all the jobs currently running, queued, or held on the system. If you are interested in a specific job or jobs, you can provide a space separated list on the command line: qstat job1 job2....

+
Job id            Name             User              Time Use S Queue
+----------------  ---------------- ----------------  -------- - -----
+349726.polaris-p* PDE2             user1                    0 Q prod
+336987.polaris-p* inf_clDB         user2                    0 H large
+353205.polaris-p* 3d-2.sub         user3             2044:14* R large
+
+

One of the annoying things about qstat is that the output fields are fixed with and it will truncate the output. This is indicated by an asterisk as the last character. You can add -w for wide. It doesn't prevent truncation, but makes it less likely. A useful variant is qstat -was1. It shows the number of nodes, tasks, the requested walltime, and the comment, all on one line. qstat -wan will give you the node list you ran on, just remember that can be long. If you want an estimate of when the job will start, add the -T option. Note that start time is not available for all jobs, just the next N jobs that are expected to run. If you want to know everything there is to know about the job, add the -f flag.

+

                                                            Req'd  Req'd   Elap
+Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
+--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
+353201.polaris* user1    large    3d-1.sub    34449  60 38*    --  24:00 R 08:25    Job run at Tue Nov 15 at 16:44 on (x3006c0s13b1n0:ngpus=4:ncpus=64)+(x...
+353289.polaris* user2    medium   run_mae_l*    --   32 20*    --  12:00 Q   --     Not Running: Job would conflict with reservation or top job
+353411.polaris* user3    large    1310W60       --   64  64    --  06:00 Q   --     Not Running: Not enough free nodes available
+336990.polaris* user4    large    inf_clDB      --  464 29*    --  01:00 H   --     Job held by user4 on Mon Oct  3 20:16:26 2022
+
+The comment field is your friend. Wondering why your job isn't running? Check the comment. Wondering about the fate of a finished job? Add the -x option to see finished jobs (our history retention is currently set at two weeks) and check the comment. This cannot be stressed enough. Often, when a user ticket comes in about PBS, we answer it by looking at the comment.

+

If you are familiar with jq or some other command line JSON tool, the -F JSON option can be quite handy. grep is great, but when you grep the -f output for something, you probably want to know which node the found lines belong to. With the JSON output that is trivial.

+
allcock@polaris-login-02:~/.ssh>  qstat -fF JSON | jq '.Jobs | map_values(select(.job_state == "R") | {Job_Name, Account_Name, qtime, stime})'
+{
+  "349710.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov": {
+    "Job_Name": "P38",
+    "Account_Name": "CompBioAffin",
+    "qtime": "Fri Nov  4 11:04:12 2022",
+    "stime": "Fri Nov 11 07:52:12 2022"
+  },
+  "352220.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov": {
+    "Job_Name": "mdsim_10000_run1.pbs",
+    "Account_Name": "RL-fold",
+    "qtime": "Thu Nov 10 22:41:55 2022",
+    "stime": "Fri Nov 11 09:00:12 2022"
+  },
+
+

Queues

+

qstat -Q Will show you the names of all the queues and tell you their status. If they are enabled (Ena column), you can queue jobs into them. If they are started (Str column) then the scheduler will try and run jobs from it. There is a -f (full) option but that is mostly for admins, though you can find the min and max node count (resources_[min|max].nodect) and min and max walltime (resources_[min|max]walltime) from the output. Those values are also available in this documentation.

+

qalter: Alter a queued job

+

Users Guide Sec. 9.2, page UG-168; Reference Guide Sec. 2.40, page RG-130

+

Basically takes the same options as qsub; Say you typoed and set the walltime to 300 minutes instead of 30 minutes. You could fix it (if the job had not started running) by doing qalter -l walltime=30:00 <jobid> [<jobid> <jobid>...] + The new value overwrites any previous value.

+

qdel: Delete a queued or running job:

+

Users Guide Sec. 9.3, page UG-170; Reference Guide Sec. 2.41, page RG-143

+

qdel <jobid> [<jobid> <jobid>...]

+

qmove: Move a job to a different queue

+

Users Guide Sec. 9.7, page UG-173; Reference Guide Sec. 2.46, page RG-175

+
    +
  • qmove <new queue> <jobid> [<jobid> <jobid>...]
  • +
  • Only works before a job starts running
  • +
+

qhold,qrls: Place / release a user hold on a job

+

Reference Guide Sec 2.44, page RG-150 and Sec 2.50, page RG-183

+
    +
  • [qhold | qrls] <jobid> [<jobid> <jobid>...]
  • +
+

qselect: Query jobids for use in commands

+

Users Guide Sec. 10.1, page UG-175; Reference Guide Sec. 2.52, page RG-189

+
    +
  • qdel `qselect -N test1` will delete all the jobs that had the job name set to test1.
  • +
+

qmsg Write a message into a jobs output file

+

Users Guide Sec. 9.4, page UG-171; Reference Guide Sec. 2.47, page RG-177

+
    +
  • qmsg -E -O "This is the message" <jobid> [<jobid> <jobid>...]
  • +
  • -E writes it to standard error, -O writes it to standard out
  • +
+

qsig Send a signal to a job

+

Users Guide Sec. 9.5, page UG-172; Reference Guide Sec. 2.53, page RG-195

+
    +
  • qsig -s <signal> <jobid> [<jobid> <jobid>...]
  • +
  • If you don't specify a signal, SIGTERM is sent.
  • +
+

pbsnodes Get information about the current state of nodes

+

Reference Guide Sec 2.7 page RG-36

+

This is more for admins, but it can tell you what nodes are free (state), how many "CPUs" which is actually the number of threads (ncpus), how many GPUs (ngpus) which with some GPUs like NVIDIA A100s can change depending on the MIG mode, and if the node is shared or not (sharing).

+

pbsnodes <node name>: Everything there is to know about a node

+
> pbsnodes x3002c0s7b1n0
+x3002c0s7b1n0
+     Mom = x3002c0s7b1n0.hsn.cm.polaris.alcf.anl.gov
+     Port = 15002
+     pbs_version = 2022.1.1.20220926110806
+     ntype = PBS
+     state = free
+     pcpus = 64
+     resources_available.arch = linux
+     resources_available.demand = False
+     resources_available.gputype = A100
+     resources_available.host = x3002c0s7b1n0
+     resources_available.mem = 527672492kb
+     resources_available.ncpus = 64
+     resources_available.ngpus = 4
+     resources_available.system = polaris
+     resources_available.tier0 = x3002-g0
+     resources_available.tier1 = g0
+     resources_available.vnode = x3002c0s7b1n0
+     resources_assigned.accelerator_memory = 0kb
+     resources_assigned.hbmem = 0kb
+     resources_assigned.mem = 0kb
+     resources_assigned.naccelerators = 0
+     resources_assigned.ncpus = 0
+     resources_assigned.ngpus = 0
+     resources_assigned.vmem = 0kb
+     resv_enable = True
+     sharing = force_exclhost
+     license = l
+     last_state_change_time = Tue Nov 15 19:26:39 2022
+     last_used_time = Tue Nov 15 19:26:39 2022
+     server_instance_id = polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov:15001
+```bash
+`pbsnodes -avSj`: A nice table to see what is free and in use
+
+```bash
+> pbsnodes -avSj
+                                                        mem       ncpus   nmics   ngpus
+vnode           state           njobs   run   susp      f/t        f/t     f/t     f/t   jobs
+--------------- --------------- ------ ----- ------ ------------ ------- ------- ------- -------
+x3014c0s19b0n0  job-exclusive        1     1      0  503gb/503gb   63/64     0/0     4/4 353394
+x3014c0s19b1n0  resv-exclusive       0     0      0  503gb/503gb    0/64     0/0     4/4 --
+x3014c0s1b0n0   offline              0     0      0  503gb/503gb   64/64     0/0     4/4 --
+
+

pbsnodes -avSj | grep free | wc -l: A quick way to see how many nodes are free

+
[20220217-21:09:30]> pbsnodes -avSj | grep free | wc -l
+38
+
+

pbsnodes -avSj | grep free | awk '{print $1}': Lists the free nodes

+
[20220217-21:09:30]> pbsnodes -avSj | grep free | awk '{print $1}'
+x3201c0s25b0n0
+x3209c0s13b0n0
+x3209c0s19b0n0
+x3209c0s1b1n0
+
+

pbsnodes -l: (lowercase l) see which nodes are down. The comment often indicates why it is down

+
[20220217-21:10:31]> pbsnodes -l
+x3014c0s19b0n0       offline,resv-exclusive Xid 74 -- GPUs need reseat
+x3014c0s25b0n0       offline,resv-exclusive Checking on ConnectX-5 firmware
+
+

Job Priority

+

In PBS it is not easy to see a priority order for which jobs will run next. The best way is to use the -T option on qsub and look at the estimated start times. ALCF runs a custom scheduler algorithm, but in general, the job priority in the queue is based on several criteria:

+
    +
  1. positive balance of your project
  2. +
  3. size (in nodes) of the job, larger jobs receive higher priority
  4. +
  5. the type of project (e.g. INCITE, ALCC, or discretionary)
  6. +
  7. job duration: shorter duration jobs will accumulate priority more quickly, so it is best to specify the job run time as accurately as possible
  8. +
+

Troubleshooting / Common Errors

+

If you receive a qsub: Job rejected by all possible destinations error, then check your submission parameters. +The issue is most likely that your walltime or node count do not fall within the ranges listed above for the production execution queues. +Please see the table above for limits on production queue job sizes.

+

NOTE: For batch submissions, if the parameters within your submission script do not meet the parameters of any of the above queues you might not receive the "Job submission" error on the command line at all. +This can happen because your job is in waiting in a routing queue and has not yet reached the execution queues. +In this case you will receive a jobid back and qsub will exit, however when the proposed job is routed, it will be rejected from the execution queues. +In that case, the job will be deleted from the system and will not show up in the job history for that system. +If you run a qstat on the jobid, it will return qstat: Unknown Job Id <jobid>.

+

Using Fakeroot with Singularity

+

The fakeroot feature (commonly referred as rootless mode) allows an unprivileged user to run a container as a “fake root” user by leveraging user namespace UID/GID mapping. To request this feature be enabled on your job add the following to your qsub command line:

+

-l singularity_fakeroot=true

+ + + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/running-jobs/machine-reservations-polaris/index.html b/running-jobs/machine-reservations-polaris/index.html new file mode 100644 index 0000000000..9e7162fb57 --- /dev/null +++ b/running-jobs/machine-reservations-polaris/index.html @@ -0,0 +1,6730 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Machine Reservations - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

Machine Reservations on Polaris

+

To get a reservation, you must first demonstrate a need to run outside of the normal queueing policies. Reservations are available only to projects with a positive allocation. Lead time for approval is 5 business days. If approved, scheduling is contingent on machine availability.

+

Disclaimer: Approval for reservation requests are subject to their appropriateness and machine availability. Not all requests will be approved. It is particularly difficult to accommodate reservation requests during busy times of the year, e.g. Supercomputing, end of the ALCC and INCITE allocation cycles.

+

To request a reservation, e-mail support@alcf.anl.gov with the requested information below.

+
    +
  1. RESERVATION REQUEST FOR ALL SYSTEMS (including vis clusters) AT ALCF Machine name:
  2. +
  3. Project for reservation:
  4. +
  5. +

    ALCF account username(s) (NOT the user's legal name) for reservation:

    +

    NOTE: We can only gate a reservation on an explicit list of users or a list of groups, we can’t mix them both. So users must specify either a project/unixgroup name or a list of usernames, not both.

    +
  6. +
  7. +

    Length of reservation:

    +
  8. +
  9. Earliest date you could start:
  10. +
  11. Deadline for the run(s),
  12. +
  13. Details on the Run: Can it run anytime, day or night?
  14. +
  15. Your local time zone (e.g., US/Central):
  16. +
  17. Total number of jobs to be run:
  18. +
  19. Total amount of data generated during reservation:
  20. +
  21. For each job, indicate: Node count (Note: not processor count)
      +
    1. Run time whether this job depends on any other jobs to finish before it can start
    2. +
    3. Briefly describe the goals for this run: (Example: We are doing a scaling run of code XXXX to determine YYYY)
    4. +
    5. Please provide a detailed explanation of why this workload cannot be accomplished with the existing queues: (Requests omitting this response will be not be processed)
    6. +
    7. After a reservation is granted, you will receive a reservation name by e-mail. Use the command pbs_rstat to verify the reservation attributes.
    8. +
    +
  22. +
+

For example:

+
pbs_rstat
+Resv ID      Queue     User     State               Start / Duration / End             
+---------------------------------------------------------------------------
+A123456.po   A123456   smith@   CO       Mon Aug 18 09:00 / 43200 / Tue Aug 19 11:00
+
+qsub -q A123456 walltime=60:00 -l select=1024:system=polaris -l filesystems=eagle myprog.exe
+
+

Once the reservation is set up, jobs can be submitted to the reservation queue prior to the reservation start time.

+

For recurring reservations, the reserve_start and reserve_end are always the first instance. +reserve_index and reserve_count tell you where you are in the recurrence.

+

For jobs using 33 percent or more of a system, place your job in the queue at least 12 hours prior to the start of the reservation or your reservation may be canceled. The machine will start to drain for your reservation, and it is important that your job is ready to run.

+

You can also move jobs from the regular queue to the reservation queue at any time using the “qmove” command. +Keep in mind that a job won't start unless enough time is left in the reservation.

+

NOTE: There is NOT a 10-minute pad at the end of the reservation. +When the reservation ends all jobs are terminated, deleted, and the reservation queue is deleted. +If a routing queue is used for the reservation, then jobs may be preserved, but any running job(s) are still terminated.

+

If you have finished running your jobs before your reservation has ended, please reach out to the support team to have to release it for other users. +At this time, there is no way for a user to release a reservation early.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/running-jobs/pbs-qsub-options-table/index.html b/running-jobs/pbs-qsub-options-table/index.html new file mode 100644 index 0000000000..521b5800d0 --- /dev/null +++ b/running-jobs/pbs-qsub-options-table/index.html @@ -0,0 +1,7041 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Cobalt to PBS option Comparison - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + +

PBS Pro qsub Options

+

Version 1.2 2021-04-28

+

-l select and similar options use a lower case "L", -I for interactive is an upper case "I"

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Cobalt CLIPBS CLIPBS DirectiveFunction and Page Reference
-A <account_string>-A <account_string>#PBS Account_Name=<accounting string>"Specifying Accounting String” UG-29
-n NODES
--nodecount NODES
-l select=NODES:system=<hostname>One or more
#PBS -l <resource name>=<value>
directives
"Requesting Resources” UG-51
-t
--walltime
-l walltime=H:MM:SSOne or more
#PBS -l <resource name>=<value>
directives
"Requesting Resources” UG-51
--attrs
filesystems=<resouce>
-l filesystems=<resource>One or more
#PBS -l <resource name>=<value>
directives
"Requesting Resources” UG-51
-q-q <destination>#PBS -q <queue name>
#PBS -q @<server name>
#PBS -q <queue name>@<server name>
"Specifying Server and/or Queue” UG-29
--env-v <variable list>"Exporting Specific Environment Variables” UG-126
--env-V#PBS -V"Exporting All Environment Variables” UG-126
--attrsDone via custom resources and select statements"Setting Job Attributes” UG-16
--dependencies=<list>-W depend=afterok:<list>#PBS depend=..."Using Job Dependencies” UG-107
-I
--interactive
-IDeprecated for use in a script"Running Your Job Interactively” UG-121
--jobname-N <name>#PBS -N <job name>
#PBS -WJob_Name=<job name>
"Specifying Job Name” UG-27
-e
--error=
-e <path>#PBS -e <path>
#PBS Error_Path=<path>
"Paths for Output and Error Files” UG-42
-o
--output=
-o <path>#PBS -o <path>
#PBS Output_Path=<path>
"Paths for Output and Error Files” UG-42
-M
--notify
see note #1
-M <user list>
-m <mail options>
(-m be is suggested)
#PBS -M <mail recipients>
#PBS -WMail_Users=<mail recipients>
#PBS -m <mail points>
#PBS -WMail_Points=<mail points>
"Setting Email Recipient List” UG-26
-u
--umask
-W umask=<value>#PBS umask=<value>"Changing Linux Job umask” UG-45
-h-h#PBS -h"Holding and Releasing Jobs” UG-115
--proccount
See Note #2
-l mpiprocs
Not needed to get equivalent Cobalt functionality
One or more
#PBS -l <resource name>=<value> directives
"Requesting Resources” UG-51
+

PBS options that provide functionality above and beyond Cobalt

+

Depending on policy decisions not all of these options may be available.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Cobalt CLIPBS CLIPBS DirectiveFunction and Page Reference
N/A-a <date_time>#PBS -a"Deferring Execution” UG-119
N/A-C <directive prefix>"Changing the Directive Prefix” UG-16
N/A-c <interval>#PBS -c"Using Checkpointing” UG-113
N/A-G"Submitting Interactive GUI Jobs on Windows” UG-125
N/A-J X-Y[:Z]#PBS -J"Submitting a Job Array” UG-150
N/A-j <join>#PBS Join_Path=<joining option>"Merging Output and Error Files” UG-43
N/A-k <keep>#PBS Keep_Files=<keep option>"Keeping Output and Error Files on Execution Host” UG-44
N/A-p <priority>#PBS -p"Setting Priority for Your Job” UG-120
N/A-P <project>#PBS project=<project name>"Specifying a Project for a Job” UG-27
N/A-r <value>#PBS -r"Allowing Your Job to be Re-run” UG-118
N/A-R <remove options>"Avoiding Creation of stdout and/or stderr” UG-43
N/A-S <path list>"Specifying the Top Shell for Your Job” UG-19
N/A
See Note #3
-u <user list>#PBS User_List=<username list>"Specifying Job Username” UG-28
N/A-W block=true#PBS block=true"Making qsub Wait Until Job Ends” UG-120
N/A-W group_list=<list>#PBS group_list=<group list>"Specifying Job Group ID” UG-28
N/A-W release_nodes_on_stageout=<value>"Releasing Unneeded Vnodes from Your Job” UG-127
N/A-W run_count=<value>"Controlling Number of Times Job is Re-run” UG-119
N/A-W sandbox=<value>"Staging and Execution Directory: User Home vs. Job-specific” UG-31
N/A-W stagein=<list>#PBS -W stagein=<execution path>@<input file storage host>:<input file storage path>[,...]"Input/Output File Staging” UG-31
N/A-W stageout=<list>#PBS -W stageout=<execution path>@<output file storage host>:<output file storage path>[,...]"Input/Output File Staging” UG-31
N/A-X"Receiving X Output from Interactive Linux Jobs” UG-124
N/A-z#PBS -z"Suppressing Printing Job Identifier to stdout” UG-30
+

Notes

+
    +
  1. To get the equivalent mail notifications from PBS it requires two parameters: the -M just like Cobalt, but also -m be (the be stands for "beginning" and "end") to specify when the mails should go out. This will give you the same behavior as Cobalt.
  2. +
  3. --proccount, while available, only changed behavior on the Blue Gene machines. To get equivalent functionality just drop it from the CLI. In PBS it does influence the PBS_NODES file. See Section 5.1.3 in the PBS Users Guide page UG-78
  4. +
  5. The following Cobalt options have no equivalent in PBS:
      +
    • --cwd: use a script and cd to the directory you want to run from.
    • +
    • --user_list: There is no way to do this. We will work on adding this functionality.
    • +
    • --debuglog: Are we going to try and generate the equivalent of a .cobalt file?
    • +
    +
  6. +
  7. The following Cobalt options were Blue Gene specific and no longer apply:
      +
    • --kernel
    • +
    • -K KERNELOPTIONS
    • +
    • --ion_kernel
    • +
    • --ion_kerneloption
    • +
    • --mode: see notes on running scripts, Python, and other executables
    • +
    • --geometry
    • +
    • --disable_preboot
    • +
    +
  8. +
+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/running-jobs/unused/pbs-admin-quick-start-guide/index.html b/running-jobs/unused/pbs-admin-quick-start-guide/index.html new file mode 100644 index 0000000000..6aa31d46bf --- /dev/null +++ b/running-jobs/unused/pbs-admin-quick-start-guide/index.html @@ -0,0 +1,7031 @@ + + + + + + + + + + + + + + + + + + + + + PBS Admin Quick Start Guide - ALCF User Guides + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + +
+ + +
+ + +
+ +
+

Argonne Leadership Computing Facility

+
+ + + + + + + + + +
+
+ + +
+ + + + + + + + + + + + + + + + + +
+ + + + +
+ + +
+ + + + + + +
+
+ + + +
+
+
+ + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + +

PBS Admin Quick Start Guide

+

The single most important thing I can tell you is where to get the PBS BigBook. It is very good and a search will usually get you what you need if it isn't in here.

+ +

Checking Server Status

+

You can check overall server status and settings with:
+qmgr -c "list server" or qstat -Bf (add -w to qstat if you want to remove wrapping)
+This will show current server parameters. If you have manager/operator permissions you will also see any hidden resources.
+You may also check parameters of the scheduler with qmgr -c "list sched", and by checking $PBS_HOME/sched_priv/sched_config.
+Hook information can be checked with qmgr -c "list hook" and qmgr -c "list pbshook". Due to permissions all hook operations require root.

+

Checking / Setting Node Status

+

The pbsnodes command is your friend.

+
    +
  • check status
  • +
  • pbsnodes -av gives you everything; grep will be useful here
  • +
  • pbsnodes -v <node> <node> ... will give you all information on the listed nodes
  • +
  • pbsnodes -avSj gives you a nice table summary
  • +
  • pbsnodes -l lists the nodes that are offline
  • +
  • Taking nodes on and offline
  • +
  • pbsnodes -C <comment> -o <nodelist> will mark a node offline in PBS (unschedulable)
      +
    • Adding the time and date and why you took it offline in the comment is helpful
    • +
    • <nodelist> is space separated
    • +
    +
  • +
  • pbsnodes -r <node list> will attempt to bring a node back online + This will only remove the "offline" state from a node, if the node would be down for other reasons, that will not change. + * Use -C "" to remove any comment that was set when the node was originally marked offline.
  • +
+

Troubleshooting

+
    +
  • PBS_EXEC (where all the executables are): /opt/pbs/[bin|sbin]
  • +
  • PBS_HOME (where all the data is): /var/spool/pbs
  • +
  • logs: /var/spool/pbs/[server|mom|sched|comm]_logs
  • +
  • config: /var/spool/pbs/[server|mom|sched]_priv/
  • +
  • /etc/pbs.conf - Reference Guide Section 9.1, page RG-371
  • +
  • qstat -[x]f [jobid]
  • +
  • the -x shows jobs that have already completed. We are currently holding two weeks history.
  • +
  • the comment field is particularly useful. It will tell you why it failed, got held, couldn't run, etc..
  • +
  • The jobid is optional. Without it you get all jobs.
  • +
  • tracejob <jobid>
  • +
  • This will pull all of the logs related to the jobid on that node. Run on the pbs.server host to get most of the job information
  • +
  • If this is run on a compute node involved in jobid then it will aggregate all logs from the mom on that job from that node.
  • +
  • You may pass it the -n # option where # is number of days to look back to tell the command to search more days back in the logs. This defaults to 1 day.
  • +
  • This does a rudimentary aggregation and filter of the logs for you.
  • +
  • qselect - Reference Guide Section 2.54 page RG-187.
  • +
  • allows you to query and return jobids that meet criteria for instance the command below would delete all the jobs from Yankee Doodle Dandy, username yddandy:
  • +
  • qdel `qselect -u yddandy`
  • +
  • Error Code Table (Reference Guide Chapter 14, RG-391)
  • +
  • If a CLI command (qmgr, qsub, whatever) spits out an error code at you, go look it up in the table, you may well save yourself a good bit of time.
  • +
  • We are going to try and either get the error text to come with the code or write a utility to look it up and have that on all the systems.
  • +
+

Starting, stopping, restarting, status of the daemons:

+
    +
  • Server: on pbs0 run systemctl [start | stop |restart | status] pbs
  • +
  • MoM:
  • +
  • If you only want to restart a single MoM, ssh to the host and issue the same commands as above for ther server.
  • +
  • If you want to restart the MoM on every compute node, ssh admin.polaris then do: pdsh -g custom-compute "systemctl [start | stop |restart | status] pbs"
  • +
+

Starting, stopping scheduling across the entire complex

+

qmgr -c "set server scheduling = [True | False]"

+

IMPORTANT NOTE: If we are running a single PBS complex for all our systems (same server is handling Polaris, Aurora, Cooley2, etc) this will stop scheduling on everything.

+

To check the current status you may do: qmgr -c "list server scheduling"

+

Starting, stopping queues:

+
    +
  • started: Can you queue a job or not
  • +
  • enabled: Will the scheduler run jobs that are in the queue
  • +
+

So if a queue is started, but not enabled, users can issue qsubs and the job will get queued, but nothing will run until we renable the queue. Running jobs are unaffected.

+

qmgr -c "set queue <queue name> started = [True | False]"
+qmgr -c "set queue <queue name> enabled = [True | False]"

+

"Boosting" jobs (running them sooner)

+

There are two ways you can run a job sooner:

+
    +
  1. qmove run_next <jobid>
      +
    1. Because of the way policy is set for the acceptance testing period, any job in the run_next queue will run before jobs in the default workq with the exception of jobs that are backfilled. So by moving the job into the run_next queue, you moved it to the front of the line. There are no restrictions on this, so please do not abuse it.
    2. +
    +
  2. +
  3. qorder <jobid> <jobid>
  4. +
  5. If you don't necessarily need it to run next, but just want to rearrange the order a bit, you can use qorder which swaps the positions of the specified jobids. So, if one of them was 10th in line and one was 20th, they would switch positions.
  6. +
  7. qalter -l score_boost=NNNNN <jobid> <jobid>
    + If the job_sort_function is enabled and shows up when querying the server, you can add a numeric boost to the score of a job to push it further ahead in the queue. You have to be a manager or operator to alter this value.
  8. +
+

Reservations

+

Most of the reservation commands are similar to the job commands, but prefixed with pbs_r instead of q: pbs_rsub, pbs_rstat, pbs_ralter, pbs_rdel. You get the picture. In general, their behavior is reasonably similar to the equivalent jobs commands. Note that by default, users can set their own reservations. We have to use a hook, no_user_rsub, to prevent that. The hook does allow anyone with manager or operator permissions to set reservations.

+
    +
  • There are three types of reservations:
  • +
  • Advance and standing reservations - reservations for users; Note that you typically don't specify the nodes. You do a resource request like with qsub and PBS will find the nodes for you.
  • +
  • job-specific now reservations - we have not used these. Where they could come in handy is for debugging. A user gets a job through, we convert it to a job-specific reservation, then if their job dies, they don't have to wait through the queue again, they can keep iterating until the wall time runs out.
  • +
  • maintenance reservations. - You can explicitly set which hosts to include in the reservation.
  • +
  • Also note that reservations occur in two steps. The pbs_rsub will return with an ID but will say unconfirmed. That means it was syntactically correct, but PBS hasn't figured out if the resources are available yet. Once it has the resources, it will switch to confirmed. This normally is done as fast as you can run pbs_rstat. A reservation can only be confirmed if scheduling is enabled on the server.
  • +
  • -R (start) -E (end) are in "datetime" format: [[[[CC]YY]MM]DD]hhmm[.SS]
  • +
  • 1315, 171315, 12171315, 2112171315 and 202112171315 would all be Dec 17th, 2021 @ 13:15
      +
    • If that is in the future they are all equivalent and valid
    • +
    • If it were Dec 17th, 2021 @ 1400, then 1315 would default to the next day @ 14:00, the rest would be errors because they are in the past.
    • +
    • Be careful or this will bite you. It will confirm the reservation and you will expect it to start in a few minutes, but it is actually for tomorrow.
    • +
    +
  • +
  • pbs_rsub -N rsub_test -R 2023 -D 05:00 -l select=4
  • +
  • probably not what you think: resv_nodes = (edtb-03[0]:ncpus=1)+(edtb-03[0]:ncpus=1)+(edtb-03[0]:ncpus=1)+(edtb-03[0]:ncpus=1) It gave me 4 cores on the same node.
  • +
  • pbs_rsub -N rsub_test -R 2023 -D 05:00 -l select=2 -l place=scatter
  • +
  • Getting closer: resv_nodes = (edtb-01[0]:ncpus=1)+(edtb-02[0]:ncpus=1)
  • +
  • The -l place=scatter got me two different nodes, but edtb allows sharing, so I got one thread on each node, but there were actually jobs running on those nodes at the time. On Polaris, since the nodes are force_exclhost that wouldn't have been an issue.
  • +
  • pbs_rsub -N rsub_test -R 2217 -D 05:00 -l select=2:ncpus=64 -l place=scatter:excl This gave me what I wanted:
      +
    • resv_nodes = (edtb-03[0]:ncpus=64)+(edtb-04[0]:ncpus=64)
    • +
    • Leaving it to default to ncpus=1 should work, but asking for them all isn't a bad idea.
    • +
    +
  • +
  • pbs_rsub -N rsub_test -R 1200 -D 05:00 --hosts x3004c0s1b0n0 x3003c0s25b0n0...
  • +
  • If you use --hosts it makes it a maintenance reservation. You can't / don't need to add -l select or -l place on a maintenance reservation. PBS will set it for you and will make it the entire host and exclusive access. Nodes don't have to be up. If jobs are running they will continue to run. This will override any other reservation.
  • +
  • pbs_ralter You can use this to change attributes of the reservation (start time, end time, how many nodes, which users can access it, etc). Works just like qalter for jobs.
  • +
  • pbs_rdel <reservation id> This will kill all running jobs, delete the queue, meaning you lose any jobs that were in the queue, and release all the resources.
  • +
  • NOTE: once the reservation queue is in place, you use all the normal jobs commands (qsub, qalter, qdel, etc.) to manipulate the jobs in the queue. On the qsub you have to add -q <reservation queue name>
  • +
+

Giving users access to the reservation

+

By default, only the person submitting the reservation will be able to submit jobs to the reservation queue. You change this with the -U +username@*,+username@*,.... You can add this to the initial pbs_rsub or use pbs_ralter after the fact. The plus is basically ALLOW. We haven't tested it, but you can also theoretically use a minus for DENY. You may also gate on group membership by setting qmgr -c "set queue <reservation queue name> acl_group_enable=True" and then adding groups to acl_groups on the reservation queue, using the same sort of syntax as you use for acl_users. This is a bit of a hack, but if you want anyone to be able to run you can do qmgr -c "set queue <reservation queue name> acl_user_enable=False"

+

WARNING: if you have both acl_users and acl_groups enabled, then the submitting user must be in the group and the user ACL list otherwise the job will be rejected! It is recommended that only one or the other be used on a queue.

+

MIG mode

+
    +
  • See the Nvidia Multi-Instance GPU User Guide for more details.
  • +
  • sudo nvidia-smi mig -lgip List GPU Instance Profiles; This is how you find the magic numbers used to configure it below.
  • +
  • sudo nvidia-smi mig -lgipp list all the possible placements; The syntax of the placement is {<index>}:<GPU Slice Count>
  • +
  • nvidia-smi --query-gpu=mig.mode.current --format=csv,noheader - check the status of all the GPUs on the node; add -i <GPU number> to check a specific GPU
  • +
  • systemctl stop nvidia-dcgm.service ; systemctl stop nvsm ; sleep 5 ; /usr/bin/nvidia-smi -mig 1 Put the node in MIG mode; -mig 0 will take it out of MIG mode.
  • +
  • nvidia-smi mig -i 3 -cgi 19,19,19,19,19,19,19 -C configure GPU #3 to have 7 instances.
  • +
  • nvidia-smi mig --destroy-compute-instance; nvidia-smi mig --destroy-gpu-instance Will free up the resources; You have to do this before you can change the configuration.
  • +
+

Polaris Rack and Dragonfly group mappings

+
    +
  • Racks contain (7) 6U chassis; Each chassis has 2 nodes for 14 nodes per rack
  • +
  • The hostnames are of the form xRRPPc0sUUb[0|10]n0 where:
      +
    • RR is the row {30, 31, 32}
    • +
    • PP is the position in the row {30 goes 01-16, 31 and 32 go 01-12}
    • +
    • c is chassis and is always 0 (I wish they would have counted up chasses, oh well)
    • +
    • s stands for slot, but in this case is the RU in the rack. Values are {1,7,13,19,25,31,37}
    • +
    • b is BMC controller and is 0 or 1 (each node has its own BMC)
    • +
    • n is node, but is always 0 since there is only one node per BMC
    • +
    +
  • +
  • So, 16+12+12 = 40 racks * 14 nodes per rack = 560 nodes.
  • +
  • Note that in production group 9 (the last 4 racks) will be the designated on-demand racks
  • +
  • The management racks are x3000 and X3100 and are dragonfly group 10
  • +
  • The TDS rack is x3200 and is dragonfly group 11
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Group 0Group 1Group 2Group 3Group 4Group 5Group 6Group 7Group 8Group 9
x3001-g0x3005-g1x3009-g2x3013-g3x3101-g4x3105-g5x3109-g6x3201-g7x3205-g8x3209-g9
x3002-g0x3006-g1x3010-g2x3014-g3x3102-g4x3106-g5x3110-g6x3202-g7x3206-g8x3210-g9
x3003-g0x3007-g1x3011-g2x3015-g3x3103-g4x3107-g5x3111-g6x3203-g7x3207-g8x3211-g9
x3004-g0x3008-g1x3012-g2x3016-g3x3104-g4x3108-g5x3112-g6x3204-g7x3208-g8x3212-g9
+

Restricting a Reservation to Vnodes With Specific Resources

+

You can restrict a reservation to particular resources in the select statement just like you can with job placement. For instance, to restrict replacement to nodes that are not in the on-demand queue you can use -l select=256:demand=False in your select statement for a regular or repeating reservation.

+

Removing Blocking Resources

+

There is a current behavior in PBS where reservations may inherit server defaults as restrictions and may not check other server values. This may result in jobs running unexpectedly, or may cause a job to not be queued.

+

To fix jobs not being queued, some resources_max restrictions may have to be removed from the reservation queue, for example, you can clear filesystems and project_priority with the following:
+gmgr -c "unset queue <reservation queue name> resources_max.filesystems"
+gmgr -c "unset queue <reservation queue name> resources_max.project_priority"

+

If you need to add an additional restriction, you can likewise set a resource on the queue as a resources_max restrictions, for instance, to forbid eagle_fs from being used you can run:
+qmgr -c "set queue <reservation queue name> resources_max.eagle_fs=False"
+qmgr -c "set queue <reservation queue name> resources_mix.eagle_fs=False"

+

You can also set this as a part of the -l flag options at reservation creation.

+ + + + + + + + + + + + + +
+
+ + + +
+ +
+ + + + +
+
+ + + + + + + + + + + +
+
+ +
+
+
+
+ + + + + + + + + + + + \ No newline at end of file diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 0000000000..19dd8d8abb --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"ALCF User Guides","text":"

We are moving our ALCF documentation into GitHub to make it easier to contribute and collaborate to our user and machine guides.

Our user guides contain information for:

  • Account and Project Management: Information and instructions on how to manage your ALCF account and awarded project.
  • Data Management: Information on our file systems that are mounted globally across all of our production systems.
  • Polaris: Information on how to get started our newest supercomputer.
  • Theta: Information on how to use our Cray XC40/KNL supercomputer.
  • ThetaGPU: Information on how to use our NVIDIA DGX A100 supercomputer.
  • Cooley: Information on how to use our visualization cluster.
  • AI Testbed: Information on how to use our AI Accelerators.
  • Aurora/Sunspot: Information on getting your code ready for our upcoming exacale supercomputer.
  • Services: Information on how to use various services provided across clusters.
  • Facility Policies: Information on our policies and procedures.
"},{"location":"#how-to-get-access","title":"How to Get Access","text":"

Researchers interested in using the ALCF systems (including Polaris and the AI Testbed\u2019s Cerebras CS-2 and SambaNova DataScale platforms) can now submit project proposals via the ALCF\u2019s Director\u2019s Discretionary program. Calls for porposals for additional allocation programs will be open at a later date.

Submit your proposal requests at: Allocation Request Page

"},{"location":"#getting-started","title":"Getting Started","text":"

If you'd like to get started using our ALCF resources, our Getting Started webpage provides information on what you need to do in order to get time on our systems, get an account, and how to start running jobs.

If you have an account and an award for Polaris, we suggest visiting on Getting on Polaris webpage.

If you'd like to user ThetaGPU, visit our Getting Started on ThetaGPU webpage.

If you'd like to user our AI accelerators, visit our Getting Started on AI Testbed webpage.

Please send feedback to support@alcf.anl.gov

"},{"location":"account-project-management/accounts-and-access/accounts-and-access-faqs/","title":"Accounts and Access FAQ","text":""},{"location":"account-project-management/accounts-and-access/accounts-and-access-faqs/#how-do-i-request-a-new-projectallocation","title":"How do I request a new project/allocation?","text":"

There are 3 allocation opportunities at ALCF. Please see How to Get an Allocation on how to get time on our systems.

"},{"location":"account-project-management/accounts-and-access/accounts-and-access-faqs/#who-do-i-contact-if-my-discretionary-project-allocation-expires-or-if-i-need-to-request-additional-hours","title":"Who do I contact if my Discretionary Project Allocation expires or if I need to request additional hours?","text":"

To request an extension of your existing discretionary allocation or to request additional hours, please email support@alcf.anl.gov with answers to the following or fill out the form at request an extension/additional hours: - What you have accomplished with your original allocation? - Please include a brief description of any publications or major presentations that were (or will be) generated in full or in part because of this allocation. - What you will do with the extra time? - What you are requesting as your new expiration date? - How many additional hours you are requesting?

"},{"location":"account-project-management/accounts-and-access/accounts-and-access-faqs/#how-do-i-join-a-project","title":"How do I join a project?","text":"

To join a project, please go to https://accounts.alcf.anl.gov, then click \"join a project\". Once there, scroll down to the project you want to join and click on it. At the bottom of the next page, please click on the \"Request Membership\" button. Once we receive approval from the PI regarding your membership request, we will provide you with access to the necessary resources.

"},{"location":"account-project-management/accounts-and-access/accounts-and-access-faqs/#how-do-i-request-a-reservation","title":"How do I request a reservation?","text":"

Reservation requests must include information detailed here:

  • Machine Reservations: Please email the completed reservation request to support@alcf.anl.gov. We will contact you after your request is reviewed by our reservations committee.
"},{"location":"account-project-management/accounts-and-access/accounts-and-access-faqs/#how-do-i-apply-for-a-new-account","title":"How do I apply for a new account?","text":"

Note: All ALCF accounts must be associated with an allocated project.

  • Request a new account: https://www.alcf.anl.gov/support-center/get-started/request-account
  • ALCF Accounts: https://accounts.alcf.anl.gov/
"},{"location":"account-project-management/accounts-and-access/accounts-and-access-faqs/#what-do-i-do-when-my-alcf-account-expires","title":"What do I do when my ALCF account expires?","text":"

Please forward your account expiry email to your Sponsor. As soon as we receive an approval email from your Sponsor, we'll proceed with your account renewal process as needed.

"},{"location":"account-project-management/accounts-and-access/accounts-and-access-faqs/#what-do-i-do-when-i-receive-a-warning-that-my-593-has-expired-is-about-to-expire","title":"What do I do when I receive a warning that my 593 has expired / is about to expire?","text":"

If you are planning to extend this assignment/computer user account, please let us know, so a new 593 (Foreign Visit & Assignment Request form) will be filed for you using the information from before. In case any other documents are needed from your end, you'll be contacted as necessary. In order to allow sufficient time for an indices check, it is recommended that your response be submitted as soon as possible.

If you are not planning to extend your account, also let us know so that we may close out your records.

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/","title":"ALCF Passcode Tokens","text":"

Please note: An account can be associated with a single token only (Mobile or Physical token). Please contact accounts@alcf.anl.gov to change your token preference.

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#mobile-token","title":"Mobile Token","text":"

The SafeNet MobilePass+ Mobile Token allows access to ALCF systems. This security mobile token uses one-time passwords combined with your PIN for controlled access to the login systems. The mobile token utilizes an app that is keyed to your user account and for which you are responsible on your Android, iPhone or Windows mobile device. Please safeguard your phone as you would your credit cards or house keys: Do not store username, PIN, or other account-related records with the token. Sharing of mobile tokens is strictly forbidden. A mobile token can be associated with a single device only.

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#step-1-download-the-safenet-mobilepass-app-for-your-device","title":"Step 1. Download the SafeNet MobilePass+ app for your device:","text":"

The SafeNet MobilePASS+ app turns your mobile phone into a two-factor authentication device, removing the need to carry an additional hardware token. As a SafeNet MobilePASS+ user, you can generate passcodes on your mobile device and use those passcodes to authenticate on ALCF computing resources. See supported OS and platforms for more information.

SafeNet MobilePass+ for Android can be found here: https://play.google.com/store/apps/details?id=com.gemalto.mpassplus

SafeNet MobilePass+ for iPhone can be found here: https://itunes.apple.com/us/app/safenet-mobilepass/id1056481326?mt=8

SafeNet MobilePass+ for Windows can be found here: https://www.microsoft.com/en-us/p/safenet mobilepass/9nblggh10pdq?activetab=pivot%3Aoverviewtab

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#step-2-enroll-your-mobilepass-mobile-token","title":"Step 2. Enroll your MobilePass+ mobile token:","text":"

After you\u2019ve been provisioned a mobile token, you will receive a notification email with the subject line \"ALCF Mobile Token Self-Enrollment\" which you must access from your mobile phone.

Auto-Enrollment (to enroll SafeNet MobilePass+ token automatically):

  1. Click on the http:// link in the email. The SafeNet Authentication Service Self-Enrollment will open.
  2. Click enroll your SafeNet MobilePass+ token.
  3. When prompted to open in MobilePass+ tap Open.
  4. You will now be prompted to enter a 6 digit all numeric PIN.
  5. Enter your PIN in the Token PIN field and repeat in the Confirm PIN field.
  6. You will be taken to the Enrollment Complete screen to name the token.
  7. Insert the desired name in the Token Name field or leave it as is. This name is not utilized by the server; it is for you only.
  8. The newly enrolled SafeNet MobilePass+ token is now displayed in the SafeNet MobilePass+ app.

Manual Enrollment:

  1. Copy the activation string from the SafeNet provision email.
  2. Open the SafeNet MobilePass+ app and tap the manual option.
  3. Paste the enrollment string into the field provided and tap the Enroll button.
  4. You will now be prompted to enter a 6 all numeric PIN.
  5. Enter your PIN in the Token PIN field and repeat in the Confirm PIN field.
  6. You will be taken to the Enrollment Complete screen to name the token.
  7. Insert the desired name in the Token Name field or leave it as is. This name is not utilized by the server; it is for you only.
"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#logging-in-to-an-alcf-system-using-a-mobile-token","title":"Logging in to an ALCF System using a Mobile Token","text":"
  1. Open the MobilePASS+ (MobilePASS for Windows) app on your device. Then initiate an SSH session and type the following:
ssh <ALCF username>@<system_name>.alcf.anl.gov\n
  1. When prompted for a password, click the SafeNet MobilePASS+ app on your phone. Click on the token name listed within the app, and enter your PIN.

  2. The app will display your passcode immediately. Enter the passcode as the login password for the system within the SSH session. Please Note: You do NOT have to enter the PIN on the SSH screen when logging into a resource. This only needs to be done to access the passcode within the SafeNet MobilePASS+ (MobilePASS for Windows) app.

  3. Each generated passcode is valid on the SafeNet MobilePass+ app window until your mobile device screen times out.

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#troubleshooting-your-mobile-token","title":"Troubleshooting your Mobile Token","text":"

Case 1: Forgotten PIN: If you enter a PIN for your mobile Token and you get an invalid PIN, you will be asked to re-enter your PIN. After 6 failed attempts your token will be deleted and you will need to call the ALCF help desk or send an email to ALCF support to have a new mobile token provisioned.

Case 2: Account Lockout: If you fail to enter the correct password 6 times, you will get a permission denied error on the SSH screen. Upon 4 more failed attempts, your IP will be blocked. You will need to call the ALCF help desk and submit a ticket to have the IP unblocked.

Case 3: PIN Change: While logged in to the mobile token, click on token settings then tap change PIN. Enter the current PIN followed by the new PIN and confirm.

Case 4: Re-Sync: If you are unable to log in to a resource after entering the correct PIN and passcode your token may be out of sync with the server. Please email ALCF Service Desk at support at alcf.anl.gov for assistance.

Case 5: New Mobile Device: If you have a new mobile device, please email the ALCF Service Desk at support at alcf.anl.gov to have a new mobile token provisioned.

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#physical-token","title":"Physical Token","text":"

The physical token allows access to the ALCF systems. This security token uses one-time passwords combined with your PIN for controlled access to the login systems. The physical token is a tracked asset for which you are responsible and is keyed to your use. Please safeguard your token as you would your credit cards or house keys: Do not store username, PIN, or other account-related records with the token. Sharing of tokens is strictly forbidden. Please do not mark on the token or alter it in any way.

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#enabling-your-alcf-physical-token","title":"Enabling Your ALCF Physical Token","text":"

Upon receipt of CRYPTOCard token, contact support@alcf.anl.gov must verify identity and activate token. If this step is not performed the CRYPTOCard token will not be able to log on to the ALCF resource.

ALCF Support Desk Info Hours: Monday-Friday 9 a.m. - 5 p.m. (Central time); Email: support@alcf.anl.gov

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#logging-in-to-an-alcf-system-using-a-physical-token","title":"Logging in to an ALCF System using a Physical Token","text":"

When the physical token is activated, an initial PIN will be provided. This will be a four-digit number that will prepend to the one-time password string generated by the token.

Upon INITIAL login (to one of the ALCF machines), a prompt to change the PIN will appear. PINs must be at least four characters long and must only contain numbers.

  1. Initiate an SSH session using:

    ssh <ALCF username>@<system_name>.alcf.anl.gov\n

  2. A password prompt will be received. At this point, push the button on the physical token once.

  3. An eight-character, one-time password made up of letters and numbers will appear on the token\u2019s display. This one-time password is case-sensitive.

  4. Type your PIN followed immediately by the one-time password at the SSH password prompt.

For example, if your PIN is 1234 and you received the one-time password string ABCD9876, you would type 1234ABCD9876 at the password prompt.

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#troubleshooting-your-physical-token","title":"Troubleshooting Your Physical Token","text":"

Case 1: It says \"locked\": The physical token may be locked due to too many failed attempts. Please contact the ALCF Help Desk to return the defective token and so a replacement can be sent.

Case 2: You have a PIN for your physical token: Once a PIN has been set for your physical token, you will need to prepend your PIN to the token password. Otherwise you will not be able to log in. If you do not remember your PIN, please email us so we can verify your identity and reset your Initial PIN.

Case 3: It does not say \"locked\" but still does not work: It is likely that your token has fallen out of sync with the server. If you have pushed the button on your physical token more than 10 times without successfully logging in, it will fail to authenticate because it has lost synchronization with the server. Please try connecting to Theta first. If it still fails, please follow the re-sync instructions below.

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#re-sync-instructions","title":"Re-Sync Instructions","text":"

If you have pushed the button on your physical token more than 10 times, it will fail to authenticate because it has lost synchronization with the server. You can re-synchronize your token using the following procedure:

1. Have your physical token ready.\n\n2. Obtain a challenge sequence:\n    - Initiate an SSH session to a host that allows token\n      authentication (such as theta.alcf.anl.gov). At the password\n      prompt, just hit 'Enter'. This will cause the Cryptocard service\n      to produce a challenge string consisting of 8 numbers.\n\n3. Hold down the button on your token for a few seconds until the\n    display says \"Init\", then let go.\n\n4. The token will scroll through a series of menu options. When it\n    displays \"ReSync\", hit the button again.\n\n5. The display will say\n\n     Resync?0\n\n6. The number at the end will start cycling from 0 to 9, over and over.\n\n7. Look at the numbers in your challenge string. When the number\n    displayed on your token changes to the first number of the challenge\n    string, press the button. The display will now show this number, and\n    the second digit will start cycling.\n\n8. Enter each of the numbers from your challenge string in the same\n    manner, until the display on your token matches the entire challenge string.\n    Choose the \"<\" to backspace and re-enter the previous number if\n    necessary.\n\n9. Once you've entered all 8 digits, re-check to make sure they're\n    accurate. Then, while all 8 digits are displayed on the token, press\n    the button to generate a new password.\n\n10. Enter your PIN followed by the new password, and hit 'Enter'. \n     If successful, you will be logged in to the resource. You're now back \n     in sync with the authentication server.\n\nIf you are unsuccessful, you will be presented with another challenge string. \nAt this point, you may need to perform the re-sync instructions again.\n

If there are still problems after completing the re-synchronization procedures, please email us at support@alcf.anl.gov so we can run a test on the physical token to determine if it is defective.

If it is found to be defective we will promptly replace it. Physical tokens are the property of Argonne National Laboratory.

Please return them to us at:

ALCF Help Desk\nArgonne National Laboratory\n9700 S. Cass Ave.\nBldg. 240, Rm. 2129\nLemont, IL 60439\n
"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#resetting-the-physical-token-pin","title":"Resetting the Physical Token PIN","text":"

Please email us at support at alcf.anl.gov for PIN resets. Once your identity has been verified, we will provide you with a new PIN for your CRYPTOcard token.

"},{"location":"account-project-management/accounts-and-access/alcf-passcode-tokens/#returning-a-physical-token","title":"Returning a Physical Token","text":"

If you no longer need your physical token, please return it to this address:

ALCF Help Desk\nArgonne National Laboratory\n9700 S. Cass Ave.\nBldg. 240, Rm. 2129\nLemont, IL 60439\n
"},{"location":"account-project-management/accounts-and-access/user-account-overview/","title":"ALCF User Account Overview","text":"

All computing carried out on the ALCF systems is associated with a user \"account.\" This account is used to log onto the login servers and run jobs on the resources. If someone has a user account, then he or she has a login name that is recorded in the user database. This web page describes the process that users will need to understand to manage account details, including policies and procedures.

If you need an account, visit the Accounts and Project Management website: Request an account

If you want to learn how to get started, visit the Get Started Guide: Get Started Guide

"},{"location":"account-project-management/accounts-and-access/user-account-overview/#who-can-get-an-account","title":"Who Can Get an Account","text":"

Those who are interested in having an account on a ALCF resource must first request an allocation and provide a detailed description of the work, including computational requirements and coding capabilities for the Blue Gene platform. Another means of acquiring an allocation on the ALCF system is to be part of a project team that already has an active allocation. Once an allocation has been granted, new users should complete an account request. A project\u2019s Principal Investigator (PI) must sponsor these accounts\u2014if the PI is the user, an ALCF staff member must serve as sponsor. Sponsors are asked annually to evaluate the accounts they have sponsored to determine whether or not these accounts should be kept active.

"},{"location":"account-project-management/accounts-and-access/user-account-overview/#account-abilities","title":"Account Abilities","text":"

A user with an active account can login to the ALCF login servers (e.g theta.alcf.anl.gov or cooley.alcf.anl.gov.) This account will have some home directory space, where file transfer can occur from that space via the login nodes, and where development activities, such as editing and compiling, can also occur.

"},{"location":"account-project-management/accounts-and-access/user-account-overview/#account-states","title":"Account States","text":"

Accounts are classified in one of the following categories:

  • Pending: An account that has been requested but has not yet been created.
  • Active: An account that can be used to interact with the ALCF Login Servers. This is the normal state for all accounts.
  • Inactive: An account that still exists on the system (that is, the account continues to be registered in the database and the user's files exist on disk) but the user cannot interact with the ALCF Login Servers. An account might be disabled due to misuse, security concerns, or because it is no longer allocated.
  • Deleted: An account that existed on the system and is thus in the records and backups, but whose user no longer has access to the systems or files on disk.
"},{"location":"account-project-management/accounts-and-access/user-account-overview/#more-information","title":"More Information","text":"
  • Account Policy
  • User Authentication Policy
  • Account Sponsorship and Retention Policy
"},{"location":"account-project-management/allocation-management/allocation-management/","title":"Managing Your Allocations","text":"

Allocations require management \u2013 balance checks, resource allocation, requesting more time, etc.

"},{"location":"account-project-management/allocation-management/allocation-management/#checking-for-an-active-allocation","title":"Checking for an Active Allocation","text":"

To determine if there is an active allocation, check Job Submission.

For information on how to run the query, look at our documentation on our sbank Allocations Accounting System or email support@alcf.anl.gov and ask for all active allocations.

"},{"location":"account-project-management/allocation-management/allocation-management/#using-sbank-to-determine-the-balance-of-an-allocation","title":"Using sbank to Determine the Balance of an Allocation","text":"

To determine which platforms have an active balance, check our allocation accounting system sbank.

  • To obtain the allocation balance, check the sbank command sbank-list-allocations.
  • DD projects with a negative balance will not be able to run jobs until they have requested additional time, see Getting more time below.
  • INCITE and ALCC PIs automatically email a summary of project usage. If this is a DD project, please email support@alcf.anl.gov.
"},{"location":"account-project-management/allocation-management/allocation-management/#allocation-expiration","title":"Allocation Expiration","text":"

Projects and allocations at the ALCF are different. A particular project might have multiple allocations of time. For example, a discretionary project that has been approved for more than 3 times will have 3 allocations (2 are probably expired) but just one project. Projects will not expire, allocations will. If allocations are expired, or have no hours left, jobs will not be able to run. Use the two bullets above (Checking for an active allocation and Determining the balance of an allocation) to determine active allocations.

"},{"location":"account-project-management/allocation-management/allocation-management/#getting-more-time","title":"Getting More Time","text":"

To request an extension of your existing discretionary allocation or to request additional hours, please email support@alcf.anl.gov with answers to the following:

  • What you have accomplished with your original allocation?
  • Please include a brief description of any publications or major presentations that were (or will be) generated in full or in part because of this allocation.
  • What you will do with the extra time?
  • What you are requesting as your new expiration date?
  • How many additional hours you are requesting?
"},{"location":"account-project-management/allocation-management/allocation-management/#sub-allocations","title":"Sub-allocations","text":"

Suballocations let PIs control who in their team can runs jobs, how much they are allowed to consume (allocation amount), and when they are allowed to run jobs (start and end dates)

Step 1: Create Suballocations (Project PI):

PI creates suballocations

sbank new sub <allocationid> **-name <nameofsuballoc>

Tip: see sbank new suballocation -h for all the options.

Step 2: Manage Suballocations (Project PI)

PI adds users to suballocations

sbank e sub <projectname>::<nameofsuballoc> --add-user=\"<username1> <username2> ...\"

PI can change the name of a suballocation

sbank e sub <suballocationID> --name=<new_name_of_suballocation>

By default, the primary suballocation (which is the default suballocation created when the allocation is created by ALCF) is unrestricted .i.e. enabled for all project members. That means all project members can submit jobs against the primary suballocation by default. All other suballocations are restricted by default and users have to be added for each of them.

To change the default for the primary suballocation to restrict usage, PI must first edit the suballocation:

sbank-edit-suballocation --restrict <primary suballocation id>

Then add users with this command:

sbank e sub <primary suballocation id> --add-user=\"<username1> <username2> ...\"

PI changes start and end dates for a suballocation:

sbank e sub <suallocationID> -S <start_date> -E <end_date>

PI adds hours to a suballocation:

sbank e sub <projectname>::<nameofsuballoc> --hours-to-move <hours> --to-suballocation <projectname>::<nameofsuballoc2>

Note: hourstomove must be greater than or equal to the available balance for the suballocation nameofsuballoc

Tip: see sbank e suballocation -h for all the options

Step 3: Submit Jobs (Project team)

Submit jobs to a suballocation. Note that the user should be on the suballocation\u2019s user list

Eg: qsub -l select=10,walltime=30:00,filesystems=grand:home -A <suballoctionID> -q demand test.sh

Note: Once submanagement is enabled for a project allocation, all job submissions must specify the suballocationID

Useful commands: List all suballocations for a project that shows number of jobs run, charges, allocation balance, suballocation name, and list of users

sbank-list-allocations -r polaris -p <projectname> -f\u201d+subname users_list\u201d

Tip: see sbank l a -h for all the options and sbank \u2013f\\? for list of fields that can be displayed

"},{"location":"account-project-management/allocation-management/overview/","title":"Allocations on ALCF Computing Resources","text":""},{"location":"account-project-management/allocation-management/overview/#getting-an-allocation-award","title":"Getting an Allocation Award","text":""},{"location":"account-project-management/allocation-management/overview/#incite-alcc-and-adsp","title":"INCITE, ALCC, and ADSP","text":"

Researchers gain access to ALCF systems for computational science and engineering projects\u2014typically with awards of millions of core-hours\u2014through competitive, peer-reviewed allocation programs supported by the DOE and Argonne. Our peer-reviewed award programs consist of the INCITE, ALCC, and ADSP programs. More information about the programs, including dates for our CFPs, can be found on their web pages.

"},{"location":"account-project-management/allocation-management/overview/#directors-discretionary","title":"Director's Discretionary","text":"

Alternatively, ALCF offers a Director's Discretionary allocation award program to leadership computing preparation, INCITE and ALCC scaling, and application performance to maximize scientific application efficiency and productivity on leadership computing platforms. See the Director's Discretionary (DD) Program page for more information.

"},{"location":"account-project-management/allocation-management/overview/#initializing-your-awarded-allocation","title":"Initializing Your Awarded Allocation","text":"

Projects with INCITE, ALCC, and ADSP awards will be contacted directly by the ALCF staff with information on creating accounts.

Director's Discretionary awards will receive information in the award confirmation email.

"},{"location":"account-project-management/allocation-management/overview/#allocation-resources","title":"Allocation Resources","text":"

While requesting an allocation, users can choose from:

Computes: - Polaris - Theta (KNL Node) - ThetaGPU (GPU Node) - Cooley

File System: - Grand - Eagle (Community Sharing)

"},{"location":"account-project-management/allocation-management/overview/#policy-information-related-to-allocations","title":"Policy Information Related to Allocations","text":"

Pullback Policy

"},{"location":"account-project-management/allocation-management/overview/#requesting-additional-allocation-hours","title":"Requesting Additional Allocation Hours","text":"

If you are a PI of a Director's Discretionary project that has an active allocation, you can request additional time or an extension using the allocation request form.

To request more hours, renew your project using the allocation request form."},{"location":"account-project-management/allocation-management/sbank-allocation-accounting-system/","title":"sbank Allocation Accounting System","text":"

sbank is the accounting system used within the ALCF. It tracks project allocations, usage charges, and refunds. sbank allows queries about the balance and expiration of project allocations, and has replaced the outdated cbank accounting system.

The sbank accounting system helps users manage their allocations and usage per job. It gives the PIs the ability to monitor their allocation usage by user, job, and machine. It also allows the user to monitor their usage per allocation and provides insight on how many hours are left on the project.

"},{"location":"account-project-management/allocation-management/sbank-allocation-accounting-system/#getting-started-with-sbank","title":"Getting Started with sbank","text":"

sbank Example Commands provides a set of example commands on how to use the most common commands.

"},{"location":"account-project-management/allocation-management/sbank-allocation-accounting-system/#sbank-man-pages","title":"sbank Man Pages","text":"

Use these sbank man pages to get information on how to use the commands.

  • sbank
  • sbank-detail
  • sbank-detail-allocations
  • sbank-detail-jobs
  • sbank-detail-projects
  • sbank-detail-transactions
  • sbank-detail-users
  • sbank-list
  • sbank-list-allocations
  • sbank-list-jobs
  • sbank-list-projects
  • sbank-list-transactions
  • sbank-list-users
"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/","title":"Manpage for sbank-detail-allocations","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#sbank-detail-allocations-options","title":"sbank-detail-allocations [options] [ ... ]

Detail allocation information.

NOTE: 1. The list of arguments are optional. 2. you can also enter list by using the -a option multiple times. 3. regardless, both are optional, and you can get detail allocation info using the option filters below.","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-version","title":"--version","text":"

show program's version number and exit

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-h-help","title":"-h, --help","text":"

show this help message and exit

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-a-allocation_id-allocation-idallocation_id","title":"-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID","text":"

filter on allocation id

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-e-event_id-event-idevent_id","title":"-e EVENT_ID, --event-id=EVENT_ID","text":"

filter on event id

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-f-field_info-field-to-displayfield_info","title":"-f FIELD_INFO, --field-to-display=FIELD_INFO","text":"

FIELD_INFO is [:], for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-j-jobid-jobidjobid","title":"-j JOBID, --jobid=JOBID","text":"

filter on jobid

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-n-num_fields_to_display-num-fields-to-displaynum_fields_to_display","title":"-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY","text":"

set number of fields to display

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-p-project-projectproject","title":"-p PROJECT, --project=PROJECT","text":"

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-r-resource-resourceresource","title":"-r RESOURCE, --resource=RESOURCE","text":"

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-t-transaction_id-transaction-idtransaction_id","title":"-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID","text":"

filter on transaction id

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-u-user-useruser","title":"-u USER, --user=USER","text":"

filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-w-field_info-field-width","title":"-w \"FIELD_INFO\", --field-width=","text":"

\"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-e-end-endend","title":"-E END, --end=END","text":"

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

  • ge, gt, le, lt, eq or >=, >, <=, <, ==.

Operator Defaults:

  • OPER1 is 'ge' for single date entry
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

**Date Parsing Precedence: **

  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-h-human-readable","title":"-H, --human-readable","text":"

abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions), ...

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-i-get-inactive","title":"-I, --get-inactive","text":"

also get inactive allocations

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-o-get-only-inactive","title":"-O, --get-only-inactive","text":"

only inactive allocations

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-s-start-startstart","title":"-S START, --start=START","text":"

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

  • ge, gt, le, lt, eq or >=, >, <=, <, ==.

Operator Defaults:

  • OPER1 is 'ge' for single date entry
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

Date Parsing Precedence:

  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-t-transaction_type-transaction-typetransaction_type","title":"-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE","text":"

transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-award-type-nameaward_type_name","title":"--award-type-name=AWARD_TYPE_NAME","text":"

filter on award type name

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-award-categoryaward_category","title":"--award-category=AWARD_CATEGORY","text":"

filter on award category

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-cbank-refcbank_ref","title":"--cbank-ref=CBANK_REF","text":"

filter on Clusterbank reference id

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-createdcreated_timestamp","title":"--created=CREATED_TIMESTAMP","text":"

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

  • ge, gt, le, lt, eq or >=, >, <=, <, ==.

Operator Defaults:

  • OPER1 is 'ge' for single date entry
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

Date Parsing Precedence:

  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-debugdebug_level","title":"--debug=DEBUG_LEVEL","text":"

SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-get-deleted","title":"--get-deleted","text":"

also get deleted objects

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-get-only-deleted","title":"--get-only-deleted","text":"

only deleted objects

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-all-charges","title":"--all-charges","text":"

only show list info that have charges regardless of project/user relationship

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-history-date-rangeend","title":"--history-date-range=END","text":"

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

  • ge, gt, le, lt, eq or >=, >, <=, <, ==.

Operator Defaults:

  • OPER1 is 'ge' for single date entry
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

Date Parsing Precedence:

  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-last-updatedlast_updated_timestamp","title":"--last-updated=LAST_UPDATED_TIMESTAMP","text":"

[OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

  • ge, gt, le, lt, eq or >=, >, <=, <, ==.

Operator Defaults:

  • OPER1 is 'ge' for single date entry
  • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

Date Parsing Precedence:

  • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-no-commas","title":"--no-commas","text":"

remove commas from comma separated thousands

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-no-header","title":"--no-header","text":"

do not display the header

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-no-history","title":"--no-history","text":"

do not show history information

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-no-rows","title":"--no-rows","text":"

do not display the row data

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-no-sys-msg","title":"--no-sys-msg","text":"

do not display system message

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-allocations/#-no-totals","title":"--no-totals","text":"

do not display the totals

"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-jobs/","title":"sbank-detail-jobs","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-jobs/#sbank-detail-jobs-options","title":"sbank-detail-jobs [options] [ | ... | ]

Detail job information. NOTE:

  1. The arguments or are NOT REQUIRED;
  2. event_id is the JOB DATABASE ID;
  3. is the SCHEDULER CREATED ID, such as Cobalt;
  4. can also be entered using option -j ;
  5. can also be entered using option -e ;
  6. can also be entered using option -r ;
  7. regardless, you can use options or arguments to get detail job information
  8. ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-jobs/#options","title":"OPTIONS","text":"

    --version

    show program's version number and exit

    -h, --help

    show this help message and exit

    -a ALLOCATION_ID, --allocation-id=ALLOCATION_ID

    filter on allocation id

    -e EVENT_ID, --event-id=EVENT_ID

    filter on event id

    -f FIELD_INFO, --field-to-display=FIELD_INFO

    FIELD_INFO is [:], for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\"

    -j JOBID, --jobid=JOBID

    filter on jobid

    -n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY

    set number of fields to display

    -p PROJECT, --project=PROJECT

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    -r RESOURCE, --resource=RESOURCE

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    -t TRANSACTION_ID, --transaction-id=TRANSACTION_ID

    filter on transaction id

    -u USER, --user=USER

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    -w \"FIELD_INFO\", --field-width

    \"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\"

    -E END, --end=END

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le,\u00a0lt,\u00a0eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

    -H, --human-readable

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    -S START, --start=START

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le,\u00a0lt,\u00a0eq or >=, >,<=, <, == . Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

    -T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE

    transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

    --created=CREATED_TIMESTAMP

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le,\u00a0lt,\u00a0eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

    --debug=DEBUG_LEVEL

    SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

    --eligible=ELIGIBLE_TIMESTAMP

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le,\u00a0lt,\u00a0eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

    --get-not-charged

    only un-charged jobs

    --history-date-range=END

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le,\u00a0lt,\u00a0eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

    --last-updated=LAST_UPDATED_TIMESTAMP

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le,\u00a0lt,\u00a0eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'gt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

    --no-commas

    remove commas from comma separated thousands

    --no-header

    do not display the header

    --no-history

    do not show history information

    --no-rows

    do not display the row data

    --no-sys-msg

    do not display system message

    --no-totals

    do not display the totals

    --queued=QUEUED_TIMESTAMP

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: ge, gt, le,\u00a0lt,\u00a0eq or >=, >, <=, <, ==. Operator Defaults: OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry. Date Parsing Precedence: YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/","title":"Manpage for sbank-detail-projects","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#sbank-detail-projects-options","title":"sbank-detail-projects [options] [ ... ]

    Detail project information.

    NOTE: 1. The list of arguments are optional 2. you can also enter list by using the -p option multiple times 3. regardless, both are optional, and you can get detail project info using the option filters below","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-version","title":"--version","text":"

    show program's version number and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-h-help","title":"-h, --help","text":"

    show this help message and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-a-allocation_id-allocation-idallocation_id","title":"-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID","text":"

    filter on allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-f-field_info-field-to-displayfield_info","title":"-f FIELD_INFO, --field-to-display=FIELD_INFO","text":"

    FIELD_INFO is [:], for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-n-num_fields_to_display-num-fields-to-displaynum_fields_to_display","title":"-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY","text":"

    set number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-p-project-projectproject","title":"-p PROJECT, --project=PROJECT","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-r-resource-resourceresource","title":"-r RESOURCE, --resource=RESOURCE","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-u-user-useruser","title":"-u USER, --user=USER","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-w-field_info-field-width","title":"-w \"FIELD_INFO\", --field-width","text":"

    \"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-e-end-endend","title":"-E END, --end=END","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

    • ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence: - YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-h-human-readable","title":"-H, --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-i-get-inactive","title":"-I, --get-inactive","text":"

    get inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-s-start-startstart","title":"-S START, --start=START","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

    • ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'ge' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-debugdebug_level","title":"--debug=DEBUG_LEVEL","text":"

    SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-all-charges","title":"--all-charges","text":"

    only show list info that have charges regardless of project/user relationship

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-no-commas","title":"--no-commas","text":"

    remove commas from comma separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-no-header","title":"--no-header","text":"

    do not display the header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-no-rows","title":"--no-rows","text":"

    do not display the row data

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-projects/#-no-totals","title":"--no-totals","text":"

    do not display the totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/","title":"Manpage for sbank-detail-transactions","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#sbank-detail-transactions-options","title":"sbank-detail-transactions [options] [ ... ]

    Detail transaction information.

    NOTE: 1. The list of arguments are optional 2. you can also enter list by using the -t option multiple times 3. regardless, both are optional, and you can get detail transaction info using the option filters below","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-version","title":"--version","text":"

    show program's version number and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-h-help","title":"-h, --help","text":"

    show this help message and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-a-allocation_id-allocation-idallocation_id","title":"-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID","text":"

    filter on allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-c-comment","title":"-c, --comment","text":"

    display comment

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-e-event_id-event-idevent_id","title":"-e EVENT_ID, --event-id=EVENT_ID","text":"

    filter on event id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-f-field_info-field-to-displayfield_info","title":"-f FIELD_INFO, --field-to-display=FIELD_INFO","text":"

    FIELD_INFO is [:] for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-j-jobid-jobidjobid","title":"-j JOBID, --jobid=JOBID","text":"

    filter on jobid

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-n-num_fields_to_display-num-fields-to-displaynum_fields_to_display","title":"-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY","text":"

    set number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-p-project-projectproject","title":"-p PROJECT, --project=PROJECT","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-r-resource-resourceresource","title":"-r RESOURCE, --resource=RESOURCE","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-t-transaction_id-transaction-idtransaction_id","title":"-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID","text":"

    filter on transaction id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-u-user-useruser","title":"-u USER, --user=USER","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-w-field_info-field-width","title":"-w \"FIELD_INFO\", --field-width=","text":"

    \"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-e-job_end-endjob_end","title":"-E JOB_END, --end=JOB_END","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

    • ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'ge' for single date entry
    • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-h-human-readable","title":"-H, --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-s-job_start-startjob_start","title":"-S JOB_START, --start=JOB_START","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

    • ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'ge' for single date entry
    • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-t-transaction_type-transaction-typetransaction_type","title":"-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE","text":"

    transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-attransaction_at_timestamp","title":"--at=TRANSACTION_AT_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

    • ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'ge' for single date entry
    • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-cbank-refcbank_ref","title":"--cbank-ref=CBANK_REF","text":"

    filter on Clusterbank reference id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-createdjob_created_timestamp","title":"--created=JOB_CREATED_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

    • ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'ge' for single date entry
    • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-debugdebug_level","title":"--debug=DEBUG_LEVEL","text":"

    SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-no-commas","title":"--no-commas","text":"

    remove commas from comma separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-no-header","title":"--no-header","text":"

    do not display the header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-no-rows","title":"--no-rows","text":"

    do not display the row data

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-no-totals","title":"--no-totals","text":"

    do not display the totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-transactions/#-queuedjob_queued_timestamp","title":"--queued=JOB_QUEUED_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following:

    • ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'ge' for single date entry
    • OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/","title":"Manpage for sbank-detail-users","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#sbank-detail-users-options","title":"sbank-detail-users [options] [ ... ]

    Detail user information.

    **NOTE: ** 1. Use -I to include inactive allocations 2. the list of arguments are optional 3. you can also enter list by using the -u option multiple times 4. regardless, both are optional, and you can get detail user info using the option filters below","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-version","title":"--version","text":"

    show program's version number and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-h-help","title":"-h, --help","text":"

    show this help message and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-a-allocation_id-allocation-idallocation_id","title":"-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID","text":"

    filter on allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-f-field_info-field-to-displayfield_info","title":"-f FIELD_INFO, --field-to-display=FIELD_INFO","text":"

    FIELD_INFO is [:], for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-n-num_fields_to_display-num-fields-to-displaynum_fields_to_display","title":"-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY","text":"

    set number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-p-project-projectproject","title":"-p PROJECT, --project=PROJECT","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-r-resource-resourceresource","title":"-r RESOURCE, --resource=RESOURCE","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-u-user-useruser","title":"-u USER, --user=USER","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-w-field_info-field-width","title":"-w \"FIELD_INFO\", --field-width","text":"

    \"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-e-end-endend","title":"-E END, --end=END","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-h-human-readable","title":"-H, --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-i-get-inactive","title":"-I, --get-inactive","text":"

    get inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-s-start-startstart","title":"-S START, --start=START","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-debugdebug_level","title":"--debug=DEBUG_LEVEL","text":"

    SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-all-charges","title":"--all-charges","text":"

    only show list info that have charges regardless of project/user relationship

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-no-commas","title":"--no-commas","text":"

    remove commas from comma separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-no-header","title":"--no-header","text":"

    do not display the header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-no-rows","title":"--no-rows","text":"

    do not display the row data

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail-users/#-no-totals","title":"--no-totals","text":"

    do not display the totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/","title":"Manpage for sbank-detail","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#sbank-detail-options","title":"sbank-detail [options]

    Detail Meta Command

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#commands","title":"COMMANDS","text":"
    • allocations [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-I|-O|-S|-T|...] (DEFAULT)
    • categories [-f|-n|-w|...]
    • messages [-f|-n|-w|...]
    • names [-f|-n|-w|...] jobs [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...]
    • projects [-a|-f|-n|-p|-r|-u|-w|-E|-H|-I|-S|...]
    • transactions [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...]
    • users [-a|-f|-n|-p|-r|-u|-w|-E|-H|-S|...]
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-a-allocation","title":"-a --allocation","text":"

    enter allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-c-comment","title":"-c --comment","text":"

    enter comment for new or edit commands, display comment for list commands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-e-event-id","title":"-e --event-id","text":"

    enter event db id; event db id is an internal id created by the charging system

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-f-field","title":"-f --field","text":"

    enter [:], width is optional; enter -f? or -f \"?\" for available fields, + to add fields"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-h-help","title":"-h --help","text":"

    command line help

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-j-jobid","title":"-j --jobid","text":"

    enter jobid; jobid is created by the scheduler and is not unique

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-n-num-field","title":"-n --num-field","text":"

    enter number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-p-project","title":"-p --project","text":"

    enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-r-resource","title":"-r --resource","text":"

    enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-s-suballocation","title":"-s --suballocation","text":"

    enter suballocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-t-transaction","title":"-t --transaction","text":"

    enter transaction id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-u-user","title":"-u --user","text":"

    enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-w-field-width","title":"-w --field-width","text":"

    enter the field width as follows: :, enter -w? or -w \"?\" for available fields"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-e-end","title":"-E --end","text":"

    enter end datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-h-human-readable","title":"-H --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-i-get-inactive","title":"-I --get-inactive","text":"

    include inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-o-get-only-inactive","title":"-O --get-only-inactive","text":"

    get only inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-s-start","title":"-S --start","text":"

    enter start datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-t-type","title":"-T --Type","text":"

    enter type of transaction

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-all-charges","title":"--all-charges","text":"

    for list allocations | projects | users, only show info with charges

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-at","title":"--at","text":"

    enter transaction created datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-award-category","title":"--award-category","text":"

    enter allocation award category

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-award-type-name","title":"--award-type-name","text":"

    enter allocation award-type name

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-created","title":"--created","text":"

    enter created datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-debug","title":"--debug","text":"

    enter debug level

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-get-deleted","title":"--get-deleted","text":"

    get deleted objects

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-get-not-charged","title":"--get-not-charged","text":"

    get jobs that have not been charged

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-get-only-deleted","title":"--get-only-deleted","text":"

    get only deleted objects

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-history-date-range","title":"--history-date-range","text":"

    enter history datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-last-updated","title":"--last-updated","text":"

    enter last updated datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-no-commas","title":"--no-commas","text":"

    remove commas from comma-separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-no-header","title":"--no-header","text":"

    do not display header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-no-history","title":"--no-history","text":"

    do not display history information

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-no-rows","title":"--no-rows","text":"

    do not display rows

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-no-totals","title":"--no-totals","text":"

    do not display totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-detail/#-queued","title":"--queued","text":"

    enter queued datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-examples/","title":"sbank Example Commands","text":"

    Below is a set of helpful commands to help you better manage the projects you have running at the ALCF.

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-examples/#view-your-projects-allocations","title":"View your project's allocations","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-examples/#command-sbank-list-allocations","title":"Command: sbank-list-allocations","text":"

    Use this command to list all of your active allocations for a specific project [Project-X]. This is useful when you need to provide this information in a report.

    > sbank-list-allocations -p ProjectX -r all\n Id         Start       End         Resource   Project          Jobs        Charged          Available Balance \n ---------  ----------  ----------  ---------  ---------------  ----------  ---------------  ----------------- \n 2106       2016-01-04  2017-01-01  cooley     ProjectX              1,139          6,032.8           43,967.2 \n 2146       2016-01-14  2017-01-10  theta      ProjectX                983      1,084,770.3       25,483,927.5\n 6438       2020-09-22  2022-01-01  thetagpu   ProjectX                  3              0.0            2,000.0 \n\n\nTotals:\n  Rows: 3\n  Cooley:\n    Available Balance: 43,967.2 node hours\n    Charged          : 6,032.8 node hours\n    Jobs             : 1,139 \n Theta:\n    Available Balance: 25,483,927.5 node hours \n    Charged          : 1,084,770.3 node hours \n    Jobs             : 983 \n Thetagpu:\n    Available Balance: 2,000.0 node hours\n    Charged          : 0.0 node hours\n    Jobs             : 3 \n

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-examples/#list-your-projects-quota-on-grand-andor-eagle-file-system","title":"List your project's quota on Grand and/or Eagle File system","text":"
    > sbank-list-allocations -p ProjectX -r grand\n Allocation  Suballocation  Start       End         Resource  Project      Quota\n ----------  -------------  ----------  ----------  --------  -----------  -----\n 6687        6555           2020-12-16  2022-01-01  grand     ProjectX    1.0\n\nTotals:\n  Rows: 1\n  Grand:\n    Quota: 1.0 TB\n\n> sbank-list-allocations -p ProjectX -r eagle\n Allocation  Suballocation  Start       End         Resource  Project      Quota\n ----------  -------------  ----------  ----------  --------  -----------  -----\n 6688        6556           2020-12-16  2022-01-01  eagle     ProjectX    1.0\n\nTotals:\n  Rows: 1\n  Eagle:\n    Quota: 1.0 TB\n
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-examples/#list-only-the-created-timestamp-field-for-all-allocations-that-were-created-before-01-01-2015-for-projectx-accross-all-resources","title":"List only the created timestamp field for all allocations that were created before 01-01-2015 for ProjectX accross all resources","text":"
    > sbank-list-allocations  --created \"<20150101\" -r all -p ProjectX \"-f created\"\n Created    \n ---------- \n 2016-01-04 \n 2016-01-14 \n 2016-01-15 \n\nTotals:\n  Rows: 3\nDate  filters (UTC): created < \"2015-01-01 00:00:00\",  \n
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-examples/#list-all-active-allocations-for-all-resources-for-project-projectx-and-add-the-field-created-to-the-display-list","title":"List all active allocations for all resources for project ProjectX and add the field Created \ufeffto the display list","text":"
    shrubbery~ > sbank-list-allocations -r all  -p ProjectX -f \"+created\"\n Id         Start       End         Resource   Project          Jobs        Charged          Available Balance  Created    \n ---------  ----------  ----------  ---------  ---------------  ----------  ---------------  -----------------  ---------- \n 279        2011-08-30  2020-01-01  theta      ProjectX              6,361     12,332,699.9      -12,332,699.9  2013-02-22 \n 2106       2016-01-04  2017-01-01  cooley     ProjectX              1,150          6,080.9           43,919.1  2016-01-04  \n\nTotals:\n  Rows: 2\n  Theta:\n    Available Balance: -12,332,699.9 node hours\n    Charged          : 12,332,699.9 node hours\n    Jobs             : 6,361 \n  Cooley:\n    Available Balance: 43,919.1 node hours\n    Charged          : 6,080.9 node hours\n    Jobs             : 1,150 \n
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-examples/#list-all-available-fields-for-the-sbank-list-allocations-command","title":"List all available fields for the sbank-list-allocations command","text":"
    > sbank-list-allocations  -f \"?\"\navailable fields:\n id\n start_timestamp\n end_timestamp\n resource\n project_name\n jobs_count\n charged_sum\n available_balance_sum\n created_timestamp\n award_category\n award_type_name\n admin_name\n cbank_ref\n comment\n
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-examples/#view-your-projects-users","title":"View your project's users","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-examples/#command-sbank-list-users","title":"Command: sbank-list-users","text":"

    List all charges for userx on theta on project ProjectX

    > sbank-list-users -p ProjectX -r theta -u userx\n User             Jobs        Charged         \n ---------------  ----------  --------------- \n userx                 1,814          9,884.5\n\nTotals:\n  Rows: 1\n  Resources: theta\n  Charged: 9,884.5 node hours\n  Jobs   : 1,814 \n  ```\n\n### List charges for all users in ProjectX on Cooley.\nThis works for project leads (i.e. PIs, Co-PIs, Proxies), since they can see everything in their own projects.\n

    sbank-list-users -p ProjectX -r theta User Jobs Charged

    user1 120 4,243.7 user2 0 0.0 user3 0 0.0 user4 181 1,195.5 user5 0 0.0 user6 2,560 10,868.7 user7 0 0.0 user8 0 0.0 user9 0 0.0 user10 7 3.5 user11 0 0.0

    Totals: Rows: 11 Resources: theta Charged: 16,311.4 node hours Jobs : 2,868

    ## View your project's jobs\nList jobs for user \"userx\" for jobs that started in the range 2016-02-15<= started < 2016-02-29 and add the transactions related to the job\n\n### **Command:** sbank-list-jobs\n\n**Note:** The job with the refund ```transaction_ids_list field can be shorten all the way to \"t\" in the -f \"+ t\"```\n
    shrubbery~ > sbank-list-jobs -u userx -f \"+ t\" -S \"2016-02-15...2016-02-29\" Id Jobid Resource Project Allocation User Duration Charged Transaction Ids

    1013857 730417 theta ProjectX 1740 userx 1:53:07 61,776.8 CHARGE-1011230 1013860 730558 theta ProjectX 1740 userx 1:53:07 61,776.8 CHARGE-1011233 1014168 730668 theta ProjectX 1740 userx 1:53:25 61,940.6 CHARGE-1011541

    Totals: Rows: 3 Theta: Charged : 185,494.2 node hours Duration : 6:44:00 Date filters (UTC): \"2016-02-15 00:00:00\" <= start < \"2016-02-29 00:00:00\",

    ### List the nodes used, runtime and start timestamp for Cooley job 744160\n**Note**: To display the date and time we increased the the number of characters of start_timestamp to 19\n
    catapult~ > sbank l j -r theta -j 50576 -f \"jobid nodes_used runtime start_timestamp:19\" Jobid Nodes Used Runtime Start --------- ---------- --------- ------------------- 50576 512 1:00:49 2013-01-16 21:49:30 Totals: Rows: 1
    ## View your project's transactions\n### **Command:** sbank-list-transactions\n\nList of transactions that where at or after 2016-02-29 for ProjectX add fields: job_duration, nodes_used and hosts\n\n**Note**: \n- job_duration, nodes_used and hosts are shorten, but they are still uniquely identified\n- host has the left justified width of 20, specified as \"h:-20\"\n
    catapult~ > sbank-list-transactions -p ProjectX --at \"ge 2016-02-29\" -f \"+ job_d nodes_u h:-20\" -r theta Id Resource Project Allocation At User Transaction Type Amount Jobid Job Duration Nodes Used Hosts

    1025426 theta ProjectX 2147 2016-02-29 userx CHARGE 48,005.1 740587 1:27:54 2048 MIR-00800-33BF1-2048 1028046 theta ProjectX 2147 2016-03-01 userx CHARGE 147,647.1 742090 4:30:21 2048 MIR-40000-733F1-2048 1028755 theta ProjectX 2147 2016-03-02 userx CHARGE 1,576,068.0 742126 6:00:44 16384 MIR-04000-77FF1-1638

    Totals: Rows: 3 Theta: Charges Amount: 1,771,720.2 node hours Job Duration : 11:58:98 Date filters (UTC) : at >= \"2016-02-29 00:00:00\", ```

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/","title":"Manpage for sbank-list-allocations","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#sbank-list-allocations-options","title":"sbank-list-allocations [options]","text":"

    Generate allocation list report.

    Notes: 1. Use -I to include inactive allocations 2. enter \"-r all\" to get information for all resources

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-version","title":"--version","text":"

    show program's version number and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-h-help","title":"-h, --help","text":"

    show this help message and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-a-allocation_id-allocation-idallocation_id","title":"-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID","text":"

    filter on allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-c-comment","title":"-c, --comment","text":"

    display comment

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-e-event_id-event-idevent_id","title":"-e EVENT_ID, --event-id=EVENT_ID","text":"

    filter on event id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-f-field_info-field-to-displayfield_info","title":"-f FIELD_INFO, --field-to-display=FIELD_INFO","text":"

    FIELD_INFO is [:], for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-j-jobid-jobidjobid","title":"-j JOBID, --jobid=JOBID","text":"

    filter on jobid

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-n-num_fields_to_display-num-fields-to-displaynum_fields_to_display","title":"-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY","text":"

    set number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-p-project-projectproject","title":"-p PROJECT, --project=PROJECT","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-r-resource-resourceresource","title":"-r RESOURCE, --resource=RESOURCE","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-t-transaction_id-transaction-idtransaction_id","title":"-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID","text":"

    filter on transaction id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-u-user-useruser","title":"-u USER, --user=USER","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-w-field_info-field-width","title":"-w \"FIELD_INFO\", --field-width","text":"

    \"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-e-end-endend","title":"-E END, --end=END","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-h-human-readable","title":"-H, --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-i-get-inactive","title":"-I, --get-inactive","text":"

    get inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-o-get-only-inactive","title":"-O, --get-only-inactive","text":"

    only inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-s-start-startstart","title":"-S START, --start=START","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-t-transaction_type-transaction-typetransaction_type","title":"-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE","text":"

    transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-award-type-nameaward_type_name","title":"--award-type-name=AWARD_TYPE_NAME","text":"

    filter on award-type name

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-award-categoryaward_category","title":"--award-category=AWARD_CATEGORY","text":"

    filter on award category

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-cbank-refcbank_ref","title":"--cbank-ref=CBANK_REF","text":"

    filter on Clusterbank reference id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-createdcreated_timestamp","title":"--created=CREATED_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-debugdebug_level","title":"--debug=DEBUG_LEVEL","text":"

    SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-get-deleted","title":"--get-deleted","text":"

    get deleted objects

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-get-only-deleted","title":"--get-only-deleted","text":"

    get only deleted objects

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-all-charges","title":"--all-charges","text":"

    only show list info that have charges regardless of project/user relationship

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-last-updatedlast_updated_timestamp","title":"--last-updated=LAST_UPDATED_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-no-commas","title":"--no-commas","text":"

    remove commas from comma-separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-no-header","title":"--no-header","text":"

    do not display the header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-no-rows","title":"--no-rows","text":"

    do not display the row data

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-allocations/#-no-totals","title":"--no-totals","text":"

    do not display the totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/","title":"Manpage for sbank-list-jobs","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#sbank-list-jobs-options","title":"sbank-list-jobs [options]","text":"

    Generate job list report Note: To get information for all resources, enter \"-r all\".

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-version","title":"--version","text":"

    show program's version number and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-h-help","title":"-h, --help","text":"

    show this help message and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-a-allocation_id-allocation-idallocation_id","title":"-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID","text":"

    filter on allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-e-event_id-event-idevent_id","title":"-e EVENT_ID, --event-id=EVENT_ID","text":"

    filter on event id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-f-field_info-field-to-displayfield_info","title":"-f FIELD_INFO, --field-to-display=FIELD_INFO","text":"

    FIELD_INFO is [:], for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-j-jobid-jobidjobid","title":"-j JOBID, --jobid=JOBID","text":"

    filter on jobid

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-n-num_fields_to_display-num-fields-to-displaynum_fields_to_display","title":"-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY","text":"

    set number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-p-project-projectproject","title":"-p PROJECT, --project=PROJECT","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-r-resource-resourceresource","title":"-r RESOURCE, --resource=RESOURCE","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-t-transaction_id-transaction-idtransaction_id","title":"-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID","text":"

    filter on transaction id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-u-user-useruser","title":"-u USER, --user=USER","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-w-field_info-field-width","title":"-w \"FIELD_INFO\", --field-width","text":"

    \"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-e-end-endend","title":"-E END, --end=END","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-h-human-readable","title":"-H, --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-s-start-startstart","title":"-S START, --start=START","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-t-transaction_type-transaction-typetransaction_type","title":"-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE","text":"

    transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-createdcreated_timestamp","title":"--created=CREATED_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-debugdebug_level","title":"--debug=DEBUG_LEVEL","text":"

    SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-eligibleeligible_timestamp","title":"--eligible=ELIGIBLE_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-get-not-charged","title":"--get-not-charged","text":"

    get only jobs that have not been charged

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-last-updatedlast_updated_timestamp","title":"--last-updated=LAST_UPDATED_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-no-commas","title":"--no-commas","text":"

    remove commas from comma-separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-no-header","title":"--no-header","text":"

    do not display the header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-no-rows","title":"--no-rows","text":"

    do not display the row data

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-no-totals","title":"--no-totals","text":"

    do not display the totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-jobs/#-queuedqueued_timestamp","title":"--queued=QUEUED_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/","title":"Manpage for sbank-list-projects","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#sbank-list-projects-options","title":"sbank-list-projects [options]","text":"

    Generate project list report.

    Notes:

    1. Use -I to include inactive allocations
      1. to get information for all resources, enter \"-r all\"
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-version","title":"--version","text":"

    show program's version number and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-h-help","title":"-h, --help","text":"

    show this help message and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-a-allocation_id-allocation-idallocation_id","title":"-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID","text":"

    filter on allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-f-field_info-field-to-displayfield_info","title":"-f FIELD_INFO, --field-to-display=FIELD_INFO","text":"

    FIELD_INFO is [:], for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-n-num_fields_to_display-num-fields-to-displaynum_fields_to_display","title":"-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY","text":"

    set number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-p-project-projectproject","title":"-p PROJECT, --project=PROJECT","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-r-resource-resourceresource","title":"-r RESOURCE, --resource=RESOURCE","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-u-user-useruser","title":"-u USER, --user=USER","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-w-field_info-field-width","title":"-w \"FIELD_INFO\", --field-width","text":"

    \"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-h-human-readable","title":"-H, --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-i-get-inactive","title":"-I, --get-inactive","text":"

    get inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-s-start-startstart","title":"-S START, --start=START","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-debugdebug_level","title":"--debug=DEBUG_LEVEL","text":"

    SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-all-charges","title":"--all-charges","text":"

    only show list info that have charges regardless of project/user relationship

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-no-commas","title":"--no-commas","text":"

    remove commas from comma-separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-no-header","title":"--no-header","text":"

    do not display the header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-no-rows","title":"--no-rows","text":"

    do not display the row data

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-projects/#-no-totals","title":"--no-totals","text":"

    do not display the totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/","title":"Manpage for sbank-list-transactions","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#sbank-list-transactions-options","title":"sbank-list-transactions [options]","text":"

    Generate transaction list report.

    Note: To get information for all resources, enter \"-r all\".

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-version","title":"--version","text":"

    show program's version number and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-h-help","title":"-h, --help","text":"

    show this help message and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-a-allocation_id-allocation-idallocation_id","title":"-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID","text":"

    filter on allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-c-comment","title":"-c, --comment","text":"

    display comment

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-e-event_id-event-idevent_id","title":"-e EVENT_ID, --event-id=EVENT_ID","text":"

    filter on event id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-f-field_info-field-to-displayfield_info","title":"-f FIELD_INFO, --field-to-display=FIELD_INFO","text":"

    FIELD_INFO is [:], for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-j-jobid-jobidjobid","title":"-j JOBID, --jobid=JOBID","text":"

    filter on jobid

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-n-num_fields_to_display-num-fields-to-displaynum_fields_to_display","title":"-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY","text":"

    set number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-p-project-projectproject","title":"-p PROJECT, --project=PROJECT","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-r-resource-resourceresource","title":"-r RESOURCE, --resource=RESOURCE","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-t-transaction_id-transaction-idtransaction_id","title":"-t TRANSACTION_ID, --transaction-id=TRANSACTION_ID","text":"

    filter on transaction id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-u-user-useruser","title":"-u USER, --user=USER","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-w-field_info-field-width","title":"-w \"FIELD_INFO\", --field-width","text":"

    \"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-e-job_end-endjob_end","title":"-E JOB_END, --end=JOB_END","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-h-human-readable","title":"-H, --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-s-job_start-startjob_start","title":"-S JOB_START, --start=JOB_START","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-t-transaction_type-transaction-typetransaction_type","title":"-T TRANSACTION_TYPE, --transaction-type=TRANSACTION_TYPE","text":"

    transaction types: CHARGE, REFUND, PULLBACK, DEPOSIT, VOID

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-attransaction_at_timestamp","title":"--at=TRANSACTION_AT_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-cbank-refcbank_ref","title":"--cbank-ref=CBANK_REF","text":"

    filter on Clusterbank reference id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-createdjob_created_timestamp","title":"--created=JOB_CREATED_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-debugdebug_level","title":"--debug=DEBUG_LEVEL","text":"

    SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-no-commas","title":"--no-commas","text":"

    remove commas from comma-separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-no-header","title":"--no-header","text":"

    do not display the header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-no-rows","title":"--no-rows","text":"

    do not display the row data

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-no-totals","title":"--no-totals","text":"

    do not display the totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-transactions/#-queuedjob_queued_timestamp","title":"--queued=JOB_QUEUED_TIMESTAMP","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/","title":"Manpage for sbank-list-users","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#sbank-list-users-options","title":"sbank-list-users [options]","text":"

    Generate user list report.

    Notes:

    1. Use -I to include inactive allocations
      1. for information for all resources, use \"-r all\"
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-version","title":"--version","text":"

    show program's version number and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-h-help","title":"-h, --help","text":"

    show this help message and exit

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-a-allocation_id-allocation-idallocation_id","title":"-a ALLOCATION_ID, --allocation-id=ALLOCATION_ID","text":"

    filter on allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-f-field_info-field-to-displayfield_info","title":"-f FIELD_INFO, --field-to-display=FIELD_INFO","text":"

    FIELD_INFO is [:], for available fields enter -f? or -f \"?\", to add fields enter -f \"+ [:] ...\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-n-num_fields_to_display-num-fields-to-displaynum_fields_to_display","title":"-n NUM_FIELDS_TO_DISPLAY, --num-fields-to-display=NUM_FIELDS_TO_DISPLAY","text":"

    set number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-p-project-projectproject","title":"-p PROJECT, --project=PROJECT","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-r-resource-resourceresource","title":"-r RESOURCE, --resource=RESOURCE","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-u-user-useruser","title":"-u USER, --user=USER","text":"

    filter on name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-w-field_info-field-width","title":"-w \"FIELD_INFO\", --field-width","text":"

    \"FIELD_INFO\" FIELD_INFO is :, for available fields enter -w? or -w \"?\""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-e-end-endend","title":"-E END, --end=END","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-h-human-readable","title":"-H, --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-i-get-inactive","title":"-I, --get-inactive","text":"

    also get inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-s-start-startstart","title":"-S START, --start=START","text":"

    [OPER1][...[OPER2]], where the operators OPER1 and OPER2 can be one of the following: - ge, gt, le, lt, eq or >=, >, <=, <, ==.

    Operator Defaults:

    • OPER1 is 'lt' for single date entry, OPER1 and OPER2 are 'ge' and 'lt', respectively, for range date entry.

    Date Parsing Precedence:

    • YEAR then MONTH then DAY, i.e., 121101 is parsed as YYMMDD, hence Nov. 1, 2012
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-debugdebug_level","title":"--debug=DEBUG_LEVEL","text":"

    SILENT, MUCH_LESS, LESS, MORE, VERBOSE, DEBUG, DEBUG1, DEBUG2

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-all-charges","title":"--all-charges","text":"

    only show list info that have charges regardless of project/user relationship

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-no-commas","title":"--no-commas","text":"

    remove commas from comma-separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-no-header","title":"--no-header","text":"

    do not display the header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-no-rows","title":"--no-rows","text":"

    do not display the row data

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list-users/#-no-totals","title":"--no-totals","text":"

    do not display the totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/","title":"Manpage for sbank-list","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#sbank-list-options","title":"sbank-list [options]

    List Meta Command

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#commands","title":"COMMANDS","text":"
    • allocations [-a|-c|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-I|-O|-S|-T|...] (DEFAULT)
    • categories [-f|-n|-w|...] messages [-f|-n|-w|...] names [-f|-n|-w|...]
    • jobs [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...]
    • projects [-a|-f|-n|-p|-r|-u|-w|-E|-H|-I|-S|...]
    • transactions [-a|-c|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...]
    • users [-a|-f|-n|-p|-r|-u|-w|-E|-H|-S|...]
    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-a-allocation","title":"-a --allocation","text":"

    enter allocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-c-comment","title":"-c --comment","text":"

    enter comment for new or edit commands, display comment for list commands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-e-event-id","title":"-e --event-id","text":"

    enter event db id; event db id is an internal id created by the charging system

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-f-field","title":"-f --field","text":"

    enter [:], width is optional; enter -f? or -f \"?\" for available fields, + to add fields"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-h-help","title":"-h --help","text":"

    command line help

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-j-jobid","title":"-j --jobid","text":"

    enter jobid; jobid is created by the scheduler and is not unique

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-n-num-field","title":"-n --num-field","text":"

    enter number of fields to display

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-p-project","title":"-p --project","text":"

    enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-r-resource","title":"-r --resource","text":"

    enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-s-suballocation","title":"-s --suballocation","text":"

    enter suballocation id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-t-transaction","title":"-t --transaction","text":"

    enter transaction id

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-u-user","title":"-u --user","text":"

    enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-w-field-width","title":"-w --field-width","text":"

    enter the field width as follows: :, enter -w? or -w \"?\" for available fields"},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-e-end","title":"-E --end","text":"

    enter end datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-h-human-readable","title":"-H --human-readable","text":"

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-i-get-inactive","title":"-I --get-inactive","text":"

    include inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-o-get-only-inactive","title":"-O --get-only-inactive","text":"

    include inactive allocations

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-s-start","title":"-S --start","text":"

    enter start datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-t-type","title":"-T --Type","text":"

    enter type of transaction

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-all-charges","title":"--all-charges","text":"

    for list allocations | projects | users, only show info with charges

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-at","title":"--at","text":"

    enter transaction-created datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-award-category","title":"--award-category","text":"

    enter allocation award category

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-award-type-name","title":"--award-type-name","text":"

    enter allocation award-type name

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-created","title":"--created","text":"

    enter created datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-debug","title":"--debug","text":"

    enter debug level

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-get-deleted","title":"--get-deleted","text":"

    get deleted objects

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-get-not-charged","title":"--get-not-charged","text":"

    get jobs that have not been charged

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-get-only-deleted","title":"--get-only-deleted","text":"

    get only deleted objects

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-history-date-range","title":"--history-date-range","text":"

    enter history datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-last-updated","title":"--last-updated","text":"

    enter last updated datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-no-commas","title":"--no-commas","text":"

    remove commas from comma-separated thousands

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-no-header","title":"--no-header","text":"

    do not display header

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-no-history","title":"--no-history","text":"

    do not display history information

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-no-rows","title":"--no-rows","text":"

    do not display rows

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-no-sys-msg","title":"--no-sys-msg","text":"

    do not display system message

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-no-totals","title":"--no-totals","text":"

    do not display totals

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-list/#-queued","title":"--queued","text":"

    enter queued datetime filter

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/","title":"Manpage for sbank Commands","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#sbank-options","title":"sbank [options]","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#description","title":"DESCRIPTION","text":"

    HPC Accounting System Command Line Interface

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#detail-meta-command","title":"detail meta command","text":"

    \"detail\" meta command displays information in a long format with history updates, where appropriate.

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#list-meta-command","title":"list meta command","text":"

    \"list\" meta command displays information in a table format, but no history updates are displayed.

    IMPORTANT NOTES 1. All dates entered shall be interpreted as UTC 2. non-admin users will only be able to see their content (jobs, charges, etc.) 3. project admin users will be able to see all of the content for their projects 4. staff admin users will be able to see all the content 5. --help and -h are the help options.

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#meta-commands","title":"META COMMANDS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-detail-options","title":"- detail [options]","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-list-options-default","title":"- list [options] (DEFAULT)

    DETAIL COMMANDS * allocations [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-I|-O|-S|-T|...] [ ... ] (DEFAULT) * jobs [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...] [ ... ] * projects [-a|-f|-n|-p|-r|-u|-w|-E|-H|-I|-S|...] [ ... ] * transactions [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...] [ ... ] * users [-a|-f|-n|-p|-r|-u|-w|-E|-H|-S|...] [ ... ]

    LIST COMMANDS * allocations [-a|-c|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-I|-O|-S|-T|...] (DEFAULT) * jobs [-a|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...] projects [-a|-f|-n|-p|-r|-u|-w|-E|-H|-I|-S|...] * transactions [-a|-c|-e|-f|-j|-n|-p|-r|-t|-u|-w|-E|-H|-S|-T|...] * users [-a|-f|-n|-p|-r|-u|-w|-E|-H|-S|...]

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#options","title":"OPTIONS","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-a-allocation","title":"-a --allocation

    enter allocation id

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-c-comment","title":"-c --comment

    enter comment for new or edit commands, display comment for list commands

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-e-event-id","title":"-e --event-id

    enter event db id; event db id is an internal id created by the charging system

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-f-field","title":"-f --field

    enter [:], width is optional; enter -f? or -f \"?\" for available fields, + to add fields","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-h-help","title":"-h --help

    command line help

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-j-jobid","title":"-j --jobid

    enter jobid; jobid is created by the scheduler and is not unique

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-n-num-field","title":"-n --num-field

    enter number of fields to display

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-p-project","title":"-p --project

    enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-r-resource","title":"-r --resource

    enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-s-suballocation","title":"-s --suballocation

    enter suballocation id

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-t-transaction","title":"-t --transaction

    enter transaction id

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-u-user","title":"-u --user

    enter name or id, DO NOT MIX, enter 'all' to get all, wild cards '*' is allowed, but only on names

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-w-field-width","title":"-w --field-width

    enter the field width as follows: :, enter -w? or -w \"?\" for available fields","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-e-end","title":"-E --end

    enter end datetime filter

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-h-human-readable","title":"-H --human-readable

    abbreviate numbers and use unit suffixes: K (thousands), M (millions), G (billions), T (trillions) ...

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-i-get-inactive","title":"-I --get-inactive

    include inactive allocations

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-o-get-only-inactive","title":"-O --get-only-inactive

    get only inactive allocations

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-s-start","title":"-S --start

    enter start datetime filter

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-t-type","title":"-T --Type

    enter type of transaction

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-all-charges","title":"--all-charges

    for list allocations | projects | users, only show info with charges

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-at","title":"--at

    enter transaction-created datetime filter

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-award-category","title":"--award-category

    enter allocation award category

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-award-type-name","title":"--award-type-name

    enter allocation award-type name

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-created","title":"--created

    enter created datetime filter

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-debug","title":"--debug

    enter debug level

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-get-deleted","title":"--get-deleted

    get deleted objects

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-get-not-charged","title":"--get-not-charged

    get jobs that have not been charged

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-get-only-deleted","title":"--get-only-deleted

    get only deleted objects

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-history-date-range","title":"--history-date-range

    enter history datetime filter

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-home-dir","title":"--home-dir

    enter the directory to store the pbs meta file

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-ignore-pbs-files","title":"--ignore-pbs-files

    all new pbs files will be ignored and marked as processed

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-last-updated","title":"--last-updated

    enter last updated datetime filter

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-no-commas","title":"--no-commas

    remove commas from comma-separated thousands

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-no-header","title":"--no-header

    do not display header

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-no-history","title":"--no-history

    do not display history information

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-no-rows","title":"--no-rows

    do not display rows

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-no-sys-msg","title":"--no-sys-msg

    do not display system message

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-no-totals","title":"--no-totals

    do not display totals

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#-queued","title":"--queued

    enter queued datetime filter

    ","text":""},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#more-option-explanations","title":"MORE OPTION EXPLANATIONS","text":"

    For -a, -e, -f, -w, -j, -p, -r, -t, -u, -T, --award-categories, --award_type_names, --cbank_refs options:

    These options can be entered multiple times for different values or entered once for multiple values.

    Examples:

    1. sbank-list-allocation -u \"pershey rojas allcock\" or > sbank-list-allocation -u pershey -u rojas -u allcock

    2. sbank-list-allocation -f \"id p avail\" or > sbank-list-allocation -f id -f p -f avail For -u, -p and -r the use of wild card \"*\" is allowed, but only on names, not ids:

    Examples:

    1. The following command will find allocations for users whose names start with \"pers\" and also users rojas and allcock. > sbank-list-allocation -u \"pers* rojas allcock\"
    2. The following command will find allocations for projects that contain \"ratio\" in the name. > sbank-list-allocation -p ratio
    3. The following command will find allocations for projects that end with \"tion\" in the name. > sbank-list-allocation -p *tion
    4. The following command will find allocations for projects that start with \"ab\" and end with \"ng\" in the name. > sbank-list-allocation -p ab*ng

    For -f option: This option is the display field option.

    To get the available fields enter -f? or -f \"?\". Default fields columns will be displayed if no field option is specified.

    To replace the current fields to display, enter:

    > sbank-list-allocations ... -f \"FIELD[:WIDTH]...FIELD[:WIDTH]\" or > sbank-list-allocations ... -f FIELD[:WIDTH] ... -f FIELD[:WIDTH] \n

    If you wish to add fields to the default fields, enter one + symbol anywhere in the quoted string:

    > sbank-list-allocations ... -f \"+ FIELD[:WIDTH]...FIELD[:WIDTH]\", only one + symbol is needed.\n

    The fields will be displayed in table format and in the order entered in the command line. You can specify the field width, where WIDTH can be positive or negative value. Left alignment use -, right alignment use + or nothing.

    For -w option:

    FIELD:WIDTH, if the field is displayed it will change the width for the specified field.

    NOTE: This will not add the field as in -f option, only change the width. To get available fields you can also use -w? or -w \"?\" as in -f option.

    For -S, -E, --created, --queued, --last-updated, --history-date-range options:

    These are the date filter options. All dates are treated as UTC.

    You can use any reasonable date string that resembles a date Ambiguous dates will be parsed with the following parsing precedence: **YEAR then MONTH then DAY **

    For example, 10-11-12 or 101112 will be the following date: Oct. 11, 2012 Not: Nov. 12, 2010 or Nov. 10, 2012

    Or you can specify a single date as follows:

    \"[OPER]UTC_DATE\" You can specify a date range as follows: \n\"[OPER1]UTC_DATE1...[OPER2]UTC_DATE2\" Where OPER can be one of the following operators: \"==\", \">=\", \"<=\", \">\", \"<\" or \"eq\", \"ge\", \"le\", \"gt\", \"lt\" \n

    Note: The following defaults for OPER, OPER1, OPER2 for the following options:

    Options OPER OPER1 OPER2 ------------------------- ---- ----- ----- -E, < >= < -S, >= >= < --at >= >= < --created >= >= < --eligible >= >= < --last-updated >= >= < --queued >= >= < \n

    You can also use the following key letters \"n\", \"t\", \"d\", \"w\", \"y\" as follows:

    KEY SYNTAX DEFINITIONS ---------- ----------- n[ow] now, where \"now\" is current-date current-time UTC t[oday] today, where \"today\" is current-date 00:00:00 UTC [+/-]d specified \"number\" of +/- days from \"today\" in UTC [+/-]w specified \"number\" of +/- weeks from \"today\" in UTC [+/-]y specified \"number\" of +/- years from \"today\" in UTC\n

    For -T option:

    Transaction type option. The following are the valid transaction types and their explanation: CHARGE filter on job charges PULLBACK filter on allocation pullbacks DEPOSIT filter on allocation deposits REFUND filter on job refunds VOID filter on void transactions

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#invocation","title":"INVOCATION","text":"

    sbank sbank sbank sbank-detail sbank detail sbank d sbank-detail-allocations sbank detail allocations sbank d a sbank-detail-jobs sbank detail jobs sbank d j sbank-detail-projects sbank detail project sbank d p sbank-detail-transactions sbank detail transactions sbank d t sbank-detail-users sbank detail users sbank d u sbank-list sbank list sbank l sbank-list-allocations sbank list allocations sbank l a sbank-list-jobs sbank list jobs sbank l j sbank-list-projects sbank list projects sbank l p sbank-list-transactions sbank list transactions sbank l t sbank-list-users sbank list users sbank l u

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#environment-variables","title":"ENVIRONMENT VARIABLES","text":"

    Command line default options: Define the following environment variables as you would in the command line. Once the environment variable is defined, it will be used as the default options and arguments for the specific command. Command line options will take precedence.

    sbank_DETAIL_ALLOCATIONS_ARGS

    Default arguments and options for sbank-detail-allocations.

    sbank_DETAIL_CATEGORIES_ARGS

    Default arguments and options for sbank-detail-categories.

    sbank_DETAIL_NAMES_ARGS

    Default arguments and options for sbank-detail-names.

    sbank_DETAIL_MESSAGES_ARGS

    Default arguments and options for sbank-detail-messages.

    sbank_DETAIL_JOBS_ARGS

    Default arguments and options for sbank-detail-jobs.

    sbank_DETAIL_PROJECTS_ARGS

    Default arguments and options for sbank-detail-projects.

    sbank_DETAIL_TRANSACTIONS_ARGS

    Default arguments and options for sbank-detail-transactions.

    sbank_DETAIL_USERS_ARGS

    Default arguments and options for sbank-detail-users.

    sbank_LIST_ALLOCATIONS_ARGS

    Default arguments and options for sbank-list-allocations.

    sbank_LIST_JOBS_ARGS

    Default arguments and options for sbank-list-jobs.

    sbank_LIST_PROJECTS_ARGS

    Default arguments and options for sbank-list-projects.

    sbank_LIST_TRANSACTIONS_ARGS

    Default arguments and options for sbank-list-transactions.

    sbank_LIST_USERS_ARGS

    Default arguments and options for sbank-list-users.

    "},{"location":"account-project-management/allocation-management/not_in_nav/sbank-manpage/#examples","title":"EXAMPLES

    Example 1: -f, --field

    > sbank-list-transactions ... -f field1:-20 -f field2:20 -f field3 or > sbank-list-transactions ... -f \"field1:-20 field2:20 field3\" \n
    Explanation: Fields will be displayed in order of appearance, where field1:-20 means 20 characters long, left align; where field2:20 means 20 characters long, right align; where field3 uses default sizes. Number fields default to right aligned. Text fields default to left aligned.

    Example 2: -S, -E, --created, --queued, --last-updated, --history-start, --history-end

    Single date-string examples:

    • sbank-list-allocations -S \">=Oct 11, 2014\" start dates that are >= \"2014-10-11 00:00:00\"

    • sbank-list-allocations -S \"<=2014-11-10\" start dates that are <= \"2014-11-10 00:00:00\"

    • sbank-list-allocations -E \"<20141110\" end dates that are < \"2014-11-10 00:00:00\"

    • sbank-list-allocations -E \"22:30:10\" end dates that are < \" 22:30:10\"

    • sbank-list-allocations -S \">today\" start dates that are > \" 00:00:00\"

    • sbank-list-allocations -E t end dates that are < \" 00:00:00\"

    • sbank-list-allocations -S gtnow start dates that are > \" \"

    • sbank-list-allocations -E len end dates that are <= \" \"

    • sbank-list-allocations -S \"1d\" start dates that are >= \"today +1 day\"

    • sbank-list-allocations -E \"-2w\" end dates that are < \"today -2 weeks\"

    • sbank-list-allocations -S \">=1y\" start dates that are >= \"today +1 year\"

    • sbank-list-allocations -S \">2012\" start dates that are > \"2012-- 00:00:00\"

      Range date-string examples:

      • sbank-list-allocations -S \"2013-01-01...2014-01-01\" \"2013-01-01\" <= DATES < \"2014-01-01\"

      • sbank-list-allocations -S \"-1y...t\" \"today -1 year\" <= DATES < \"today\"

      • sbank-list-allocations -E \"2013...t\"\" \"2013--\" <= DATES < \"today\"

      • sbank-list-allocations -E \">2013...<=t\"\" \"2013--\" < DATES <= \"today\"

        Example 3: Command invocation examples

        • sbank-list-projects list projects full command invocation

        • sbank list projects list projects meta command invocation

        • sbank s p list projects partial meta command invocation

        • sbank p list projects where \"list\" is the default

        • sbank list allocations is the default

        • sbank a list allocations \"list\" is the default

        • sbank s a list allocations partial meta command invocation

        Example 4: -h, --help

        • sbank -h will give you help summary on all of sbank

        • sbank list --help will give you help on all the \"list\" commands

        • sbank list allocations -h will give you help on the \"list allocations\" command

        • sbank-list-allocations -h will give you help on the \"list allocations\" command

        • sbank l a --help will give you help on the \"list allocations\" command

        ","text":""},{"location":"account-project-management/project-management/project-reports/","title":"Quarterly and Year-End Reporting","text":"

        The Argonne Leadership Computing Facility (ALCF) is required to report the progress and scientific accomplishments of all peer-reviwed projects.

        PIs of INCITE, ALCC, and ADSP projects are required to complete quarterly reports and a final end-of-project (EOY/EOP) report.

        "},{"location":"account-project-management/project-management/project-reports/#due-dates","title":"Due dates","text":""},{"location":"account-project-management/project-management/project-reports/#due-dates-for-the-2024-incite-quarterly-eoy-and-the-eop-reports","title":"Due dates for the 2024 INCITE quarterly, EOY, and the EOP reports:","text":"
        • April 1, 2024 (CY2024 - Q1)
        • July 1, 2024 (CY2024 - Q2)
        • October 1, 2024 (CY2024 - Q3)
        • January 1, 2025 (CY2025 - EOY) or February 15, 2025 (entire allocation period - EOP)
        "},{"location":"account-project-management/project-management/project-reports/#due-dates-for-the-2023-2024-alcc-quarterly-and-the-eop-reports","title":"Due dates for the 2023-2024 ALCC quarterly and the EOP reports:","text":"
        • October 1, 2023 (CY2023 - Q3)
        • January 1, 2024 (CY2024 - Q4)
        • April 1, 2024 (CY2024 - Q1)
        • August 15, 2024 (CY2024 - EOP)
        "},{"location":"account-project-management/project-management/project-reports/#penalties","title":"Penalties","text":"

        If a quarterly report is more than 30 days late: - The ability to submit jobs for the PI and users of the late project will be disabled.

        If a quarterly report is more than 90 days late: - The PI and users of the late project will have their accounts disabled.

        These penalties will be removed within three business days after the late quarterly or EOY report is submitted.

        "},{"location":"account-project-management/project-management/project-reports/#alcc-specific-penalties","title":"ALCC Specific Penalties:","text":"

        A similar penalty will also be applied to new ALCC projects with the same PI or co-PIs that have failed to submit the EOP report for a previous ALCC project. If the EOP report is more than 15 days late:

        • The new ALCC project will be blocked. For a currently active ALCC project, the ability to submit jobs will be disabled for the project and all sub-projects. For a project that has not been created yet, the process for new project creation will be halted.
        "},{"location":"account-project-management/project-management/project-reports/#appeals","title":"Appeals","text":"

        A PI or user may appeal a project or account suspension to the ALCF Director by a request to support at alcf.anl.gov.

        "},{"location":"account-project-management/project-management/project-reports/#report-templates","title":"Report Templates","text":"

        Templates for the quarterly and the EOY reports can be found at the links on the bottom of this page.

        Please modify the filename to replace PINAME with the last name of the PI of the INCITE/ALCC project, ALLOCATION to INCITE/ALCC, and YEAR to the corresponding calendar year. For quarterly reports, please replace the X in the filename with the quarter number.

        For example, for a project with PI 'Joe Smith' that is submitting the quarterly report for the first quarter in 2023-2024 cycle for ALCC, the filename will be Smith_ALCC_Q1.docx.

        For an EOY report, replace YEARS with the years associated with your allocation. For example, an ALCC 2023-2024 project with PI 'Joe Smith' would have a filename of Smith_ALCC_2023-2024_EOY.docx.

        "},{"location":"account-project-management/project-management/project-reports/#templates-for-incite-and-alcc","title":"Templates for INCITE and ALCC:","text":"
        • Quarterly Report Template
        • End of Project Report Template
        • End of Year Report Template
        "},{"location":"account-project-management/project-management/starting-alcf-award/","title":"Starting Your ALCF Award","text":"

        The following guide is for PIs and Proxies to get insight into managing projects and teams for ALCF awards. Please submit for questions or trouble tickets to support@alcf.anl.gov.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#get-started-with-alcfs-systems","title":"Get Started with ALCF\u2019s Systems","text":"

        To get started using our resources, please visit: Connect & Login

        We also encourage you to take full advantage of ALCF's training programs and user services. Some useful introductory materials and videos are listed below:

        • Running on Polaris
        • ThetaGPU Overview
        • Lustre File Striping Basics
        • Community Data Sharing with ACDC (using Eagle)
        "},{"location":"account-project-management/project-management/starting-alcf-award/#project-terminology","title":"Project Terminology","text":"

        Before your project begins, you will receive an email with the following project information:

        • Project Short Name: The assigned, shortened name for your project. This will be the name that you\u2019ll use to access your project on the systems.
        • Project Proxies: Project members designated by PIs that are authorized to add or renew project members on your behalf.
        • Allocation System(s) and Allocation Amount: The approved system(s) and amount of your award in node hours.
        • Approved Quota: The approved amount of disk space for your project directory.
        • File System: The file system where your project directory will reside. For information on the Grand and Eagle file systems, see Storage and Networking.
        • Assigned Catalyst: INCITE projects will have ALCF staff members that are assigned to the projects who are available to assist the team throughout the duration of the INCITE allocation.
        • Allocation Start Date: The start date of your award.
        • Allocation End Date: The end date of your award.
        "},{"location":"account-project-management/project-management/starting-alcf-award/#account-setup","title":"Account Setup","text":"

        If you do not have an ALCF account: You will need to request one at https://accounts.alcf.anl.gov/accountRequest. When prompted for project name, please select the project short name you were given in your award email from support@alcf.anl.gov.

        If you have an active ALCF account: Submit a request to join the newly awarded project at https://accounts.alcf.anl.gov/#!/joinProject.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#information-for-foreign-national-access","title":"Information for Foreign National Access","text":"

        The U.S. Department of Energy has guidelines and requirements for foreign nationals who access its facilities and sites. This guidance is issued in DOE Order 142.3, which is part of Argonne's contract; therefore, all foreign nationals (non-U.S. Citizens) must obtain authorization prior to using ALCF resources.

        If you are a foreign national and do not have current authorization credentials, you are required to submit a ANL-593 (Foreign National Access Request) form. It is critical that identity documentation requests sent by ALCF staff are completed as early as possible to facilitate timely processing for your account approval.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#user-agreement-for-incite-alcc-and-adsp","title":"User Agreement for INCITE, ALCC, and ADSP","text":"

        Note: This does not apply to Director's Discretionary awards.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#insitution-master-agreement-for-incite-alcc-and-adsp","title":"Insitution Master Agreement for INCITE, ALCC, and ADSP","text":"

        If you are not an employee of Argonne National Laboratory, a user agreement must be signed by your home institution to perform research at Argonne\u2019s user facilities. This policy applies to every member of the project team who will be conducting research on ALCF resources.

        A list of home institutions that have master agreements in place is located on this webpage: https://www.aps.anl.gov/Users-Information/Legal-Financial/Argonne-User-Facility-Agreements

        "},{"location":"account-project-management/project-management/starting-alcf-award/#alcf-user-agreement-for-incite-alcc-and-adsp","title":"ALCF User Agreement for INCITE, ALCC, and ADSP","text":"

        Note: This does not apply to Director's Discretionary awards.

        Every project team member who requests an ALCF account must sign and return an acknowledgment form, stating that they agree to the terms in the user agreement.

        The form is located at: https://www.alcf.anl.gov/files/Acknowledgement_Form.pdf. Please print, sign, scan and email it to accounts@alcf.anl.gov.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#managing-project-team-membership","title":"Managing Project Team Membership","text":"

        As a PI, you can add members to your project. You can assign proxies who are project members authorized to add or renew project members on your behalf.

        A project PI or proxy has the authority to:

        • Approve and renew accounts
        • Add and delete users to/from the project
        • Approve Foreign Assignment/Visit Request form renewals for project members who are foreign nationals

        During your project setup, the ALCF Support Team will request the following information to establish your project members:

        • The names, email addresses, and/or ALCF usernames (if already existing) of up to two proxies and all project members.
        "},{"location":"account-project-management/project-management/starting-alcf-award/#about-project-and-unix-group-membership","title":"About Project and UNIX Group Membership","text":"

        All project members have the ability to run jobs against your allocation. There is no limit to the number of project members you may authorize. Project members are automatically added to the project UNIX group giving them the ability to write to the project directory and to access project data. When a project member is added or removed from a project, this automatically be reflected in the project UNIX group membership.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#adding-project-members","title":"Adding Project Members","text":"

        The PI or a proxy must approve each team member to access ALCF resources and run jobs on their project. PI/proxies can respond to emails from ALCF for account access approval with a \"yes\" or \"no\".

        PI/proxies with active ALCF accounts can also approve new account requests, project membership requests, account reactivation requests, add existing active ALCF users to the project by logging into the ALCF Account and Project Management application.

        Note: If PI/proxies need to request an ALCF account, see the section below for instructions on \"how to apply\" for an account.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#accounts-and-access-for-your-project-members","title":"Accounts and Access for your Project Members","text":"

        All project members will need an ALCF user account to access project data and to run jobs on ALCF systems.

        Members that do not have an ALCF account should request one at: https://accounts.alcf.anl.gov/accountRequest. When prompted for project name, they should select your project short name.

        If your project members have ALCF accounts that are no longer active, please ask them to submit a reactivation request here: https://accounts.alcf.anl.gov/accountReactivate. When prompted for project name, they should select your project short name.

        If you project members have active ALCF accounts but have not been added to your project, they should submit a request to join your project by going to this page: https://accounts.alcf.anl.gov/#!/joinProject.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#moving-your-data","title":"Moving Your Data","text":"

        We encourage you to use Globus to move your project data to your ALCF project directory before your allocation begins. For details, see Using Globus on Theta.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#project-status-reports-for-incite-alcc-and-adsp","title":"Project Status Reports for INCITE, ALCC, and ADSP","text":"

        Note: PIs that are awarded a Director's Discretionary will not receive weekly status project reports.

        Shortly after your allocation begins, we will begin sending you a weekly project status report via support@alcf.anl.gov to keep you informed or your award progress.

        Look for an email from us with the subject line: ALCF [ALLOCATION PROGRAM] Project Status Report for [PROJECT SHORT NAME]

        "},{"location":"account-project-management/project-management/starting-alcf-award/#reporting-requirements-for-incite-alcc-and-adsp","title":"Reporting Requirements for INCITE, ALCC, and ADSP","text":"

        Note: PIs that are awarded a Director's Discretionary allocations are not required to submit project reports.

        If you receieved INCITE, ALCC, or ADSP allocation award, quarterly reporting is required to keep DOE informed of progress related to your allocation.

        The ALCF will send you a report template at the end of each quarter. Please complete the report promptly and submit it via email to support@alcf.anl.gov. For more information see the Quarterly Report webpage.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#policies","title":"Policies","text":""},{"location":"account-project-management/project-management/starting-alcf-award/#pullback-policy","title":"Pullback Policy","text":"

        Please be aware that we will periodically monitor, and could potentially adjust, your project allocation if a large portion of it goes unused. You may view: Pullback Policy

        "},{"location":"account-project-management/project-management/starting-alcf-award/#allocation-overburn-policy","title":"Allocation Overburn Policy","text":"

        Please see this page for overburn/overuse eligibility for INCITE projects that have exhausted their allocation in the first 11 months of its allocation year: Allocation Overburn

        "},{"location":"account-project-management/project-management/starting-alcf-award/#acknowledgment-in-publications","title":"Acknowledgment In Publications","text":"

        Please follow the guidelines provided on the ALCF Acknowledgement Policy page to properly acknowledge the use of ALCF resources in all of your publications, both online and print.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#facility-policies","title":"Facility Policies","text":"

        Facility policies have been established to provide consistent and reliable services. Please read about our [ALCF Facility Policies] (../policies/facility-policies.md).

        "},{"location":"account-project-management/project-management/starting-alcf-award/#useful-allocation-and-quota-commands","title":"Useful Allocation and Quota Commands","text":"

        We have an allocation management tool called sbank, and below are a few helpful sbank commands.

        • myprojectquotas: log into Theta and type this command to view the project directory quotas for all your projects
        • myquota: log into Theta and type this command to view your home directory quota

        You can use the following command to check your project balance on Theta: - sbank-list-allocations -p -r

        For more command examples and details, see sbank.

        "},{"location":"account-project-management/project-management/starting-alcf-award/#how-can-we-help","title":"How Can We Help?","text":"

        We can also help resolve any issues or needs that may be delaying the start of your scientific campaign. - Are you in need of high-throughput software? - Are you having difficulty compiling your application? - Does your code have limited restart capabilities?

        If your project allocation usage is being held back for reasons due to one of our systems, please contact us for assistance by emailing support@alcf.anl.gov.

        "},{"location":"account-project-management/project-management/team-management/","title":"Managing Your Team Members","text":"

        New project members will need a user account to access project data and to run jobs on ALCF systems.

        Please instruct any members who do not have an ALCF account to request one as soon as possible by visiting: https://accounts.alcf.anl.gov/#!/accountRequest. When prompted for project name, they should select the \"short name\" for your project.

        The PI or Proxy must approve each member of the team to gain access and to run project jobs on the ALCF's resources. If you have an active ALCF account, you can manage your project team by logging into the ALCF account and project management website and navigating to https://accounts.alcf.anl.gov/#!/manageProjects

        "},{"location":"account-project-management/project-management/team-management/#accessing-your-projects","title":"Accessing your project(s)","text":"
        1. Log in at https://accounts.alcf.anl.gov/#!/manageProjects using your credentials: ALCF username and Physical/Mobile token passcode for a password.
        2. Click on Project Management, located in the right sidebar.
        3. You will see a list of projects of which you are the Primary Investigator (PI).
        4. Click on the desired project to view information and management options for the selected project.
        "},{"location":"account-project-management/project-management/team-management/#modifying-project-information","title":"Modifying project information","text":"

        Some project information cannot be modified, but as the PI, you can modify the following: project title, institutions, and associated funding.

        Your project can be associated with multiple institutions, but you must specify a primary institution.

        "},{"location":"account-project-management/project-management/team-management/#managing-project-members-with-an-existing-alcf-account","title":"Managing project members with an Existing ALCF Account","text":"
        1. You can manage the membership for your project by clicking on the desired project from the Project Management screen.
        2. Add and/or remove proxies and team members by clicking on the red \"Remove\" button to the right of each member or clicking on \"Add new user.\"
        3. You can view account information for each user as it relates to the project:
        4. Account Status
        5. Project Role
        6. Proxy Permissions
        7. Membership Status

        8. Proxies are individuals authorized to add or renew user accounts for the project PI. You have the ability to upgrade a user from a member to a Proxy, by clicking on the \"Proxy\" radio button that corresponds with the desired member.

        "},{"location":"ai-testbed/getting-started/","title":"ALCF AI Testbed","text":"

        The ALCF AI Testbed houses some of the most advanced AI accelerators for scientific research.

        The goal of the testbed is to enable explorations into next-generation machine learning applications and workloads, enabling the ALCF and its user community to help define the role of AI accelerators in scientific computing and how to best integrate such technologies with supercomputing resources.

        The AI accelerators complement the ALCF's current and next-generation supercomputers to provide a state-of-the-art computing environment that supports pioneering research at the intersection of AI, big data, and high performance computing (HPC).

        The platforms are equipped with architectural features that support AI and data-centric workloads, making them well suited for research tasks involving the growing deluge of scientific data produced by powerful tools, such as supercomputers, light sources, telescopes, particle accelerators, and sensors. In addition, the testbed will allow researchers to explore novel workflows that combine AI methods with simulation and experimental science to accelerate the pace of discovery.

        "},{"location":"ai-testbed/getting-started/#how-to-get-access","title":"How to Get Access","text":"

        Researchers interested in using the AI Testbed\u2019s Cerebras CS-2, SambaNova DataScale SN30, Graphcore Bow Pod64 and GroqRack platforms can now submit project proposals via the ALCF\u2019s Director\u2019s Discretionary program. Access to additional testbed resources, including Habana accelerators, will be announced at a later date.

        Submit your proposal requests at: Allocation Request Page

        "},{"location":"ai-testbed/getting-started/#getting-started","title":"Getting Started","text":"
        1. Request a Director's Discretionary project on SambaNova/Cerebras/Graphcore/Groq.

        2. Apply for an ALCF account after the project request is approved. Choose the SambaNova/Cerebras/Graphcore/Groq project that your PI has created at ALCF. If you have an active ALCF account, request to join the project after your project is approved.

        3. Transfer data to ALCF using Globus after your account has been created.

          a. The endpoint for your data in ALCF is alcf#ai_testbed_projects with the path to your project being /<project name>.

          b. The endpoint for your home directory on the AI Testbeds in ALCF is alcf#ai_testbed_home.

        4. Add/invite team members to your ALCF project on SambaNova/Cerebras/Graphcore/Groq.

        "},{"location":"ai-testbed/getting-started/#how-to-contribute-to-documentation","title":"How to Contribute to Documentation","text":"

        The documentation is based on MkDocs and source files are on GitHub. You can contribute to the documentation by creating a pull request.

        Learn more on how to contribute to documentation.

        "},{"location":"ai-testbed/cerebras/customizing-environment/","title":"Customizing Environments","text":""},{"location":"ai-testbed/cerebras/customizing-environment/#using-virtual-python-environments","title":"Using virtual Python environments","text":""},{"location":"ai-testbed/cerebras/customizing-environment/#to-make-a-pytorch-virtual-environment-for-cerebras","title":"To make a PyTorch virtual environment for Cerebras","text":"
        #Make your home directory navigable\nchmod a+xr ~/\nmkdir ~/R_2.0.3\nchmod a+x ~/R_2.0.3/\ncd ~/R_2.0.3\n# Note: \"deactivate\" does not actually work in scripts.\ndeactivate\nrm -r venv_cerebras_pt\n/software/cerebras/python3.8/bin/python3.8 -m venv venv_cerebras_pt\nsource venv_cerebras_pt/bin/activate\npip install --upgrade pip\npip install cerebras_pytorch==2.0.2\n
        "},{"location":"ai-testbed/cerebras/customizing-environment/#activation-and-deactivation","title":"Activation and deactivation","text":"

        To activate a virtual environments

        source ~/R_2.0.3/venv_cerebras_pt/bin/activate\n

        To deactivate a virtual environment,

        deactivate\n
        "},{"location":"ai-testbed/cerebras/example-programs/","title":"Example Programs","text":""},{"location":"ai-testbed/cerebras/example-programs/#use-a-local-copy-of-the-model-zoo","title":"Use a local copy of the model zoo","text":"

        Make a working directory and a local copy of the Cerebras modelzoo and anl_shared repository, if not previously done, as follows.

        mkdir ~/R_2.0.3\ncd ~/R_2.0.3\ngit clone https://github.com/Cerebras/modelzoo.git\ncd modelzoo\ngit tag\ngit checkout Release_2.0.3\n
        "},{"location":"ai-testbed/cerebras/example-programs/#unet","title":"UNet","text":"

        An implementation of this: U-Net: Convolutional Networks for Biomedical Image Segmentation, Ronneberger et. al 2015 To run Unet with the Severstal: Steel Defect Detection kaggle dataset, using a pre-downloaded copy of the dataset: First, source a Cerebras PyTorch virtual environment and make sure that requirements are installed.

        source ~/R_2.0.3/venv_cerebras_pt/bin/activate\npip install -r ~/R_2.0.3/modelzoo/requirements.txt\n

        Then

        cd ~/R_2.0.3/modelzoo/modelzoo/vision/pytorch/unet\ncp /software/cerebras/dataset/severstal-steel-defect-detection/params_severstal_binary_rawds.yaml configs/params_severstal_binary_rawds.yaml\nexport MODEL_DIR=model_dir_unet\nif [ -d \"$MODEL_DIR\" ]; then rm -Rf $MODEL_DIR; fi\npython run.py CSX --job_labels name=unet_pt --params configs/params_severstal_binary_rawds.yaml --model_dir $MODEL_DIR --mode train --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.0.3/modelzoo/ --compile_dir $(whoami) |& tee mytest.log \n
        "},{"location":"ai-testbed/cerebras/example-programs/#bert-pytorch","title":"BERT - PyTorch","text":"

        The modelzoo/modelzoo/transformers/pytorch/bert directory is a PyTorch implementation of BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding This BERT-large msl128 example uses a single sample dataset for both training and evaluation. See the README.md in the source directory for details on how to build a dataset from text input. First, source a Cerebras PyTorch virtual environment and make sure that the requirements are installed:

        source ~/R_2.0.3/venv_cerebras_pt/bin/activate\npip install -r ~/R_2.0.3/modelzoo/requirements.txt\n

        Then

        cd ~/R_2.0.3/modelzoo/modelzoo/transformers/pytorch/bert\ncp /software/cerebras/dataset/bert_large/bert_large_MSL128_sampleds.yaml configs/bert_large_MSL128_sampleds.yaml\nexport MODEL_DIR=model_dir_bert_large_pytorch\nif [ -d \"$MODEL_DIR\" ]; then rm -Rf $MODEL_DIR; fi\npython run.py CSX --job_labels name=bert_pt --params configs/bert_large_MSL128_sampleds.yaml --num_workers_per_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software/ --python_paths /home/$(whoami)/R_2.0.3/modelzoo/ --compile_dir $(whoami) |& tee mytest.log\n

        The last parts of the output should resemble the following, with messages about cuda that should be ignored and are not shown.

        2023-11-29 20:07:49,284 INFO:   Beginning appliance run\n2023-11-29 20:08:14,365 INFO:   | Train Device=CSX, Step=100, Loss=9.50000, Rate=4088.28 samples/sec, GlobalRate=4088.26 samples/sec\n2023-11-29 20:08:39,820 INFO:   | Train Device=CSX, Step=200, Loss=8.37500, Rate=4048.91 samples/sec, GlobalRate=4055.21 samples/sec\n2023-11-29 20:09:05,356 INFO:   | Train Device=CSX, Step=300, Loss=7.96875, Rate=4025.61 samples/sec, GlobalRate=4040.05 samples/sec\n2023-11-29 20:09:30,626 INFO:   | Train Device=CSX, Step=400, Loss=7.56250, Rate=4041.61 samples/sec, GlobalRate=4043.10 samples/sec\n2023-11-29 20:09:56,022 INFO:   | Train Device=CSX, Step=500, Loss=7.50000, Rate=4035.92 samples/sec, GlobalRate=4040.90 samples/sec\n2023-11-29 20:10:21,410 INFO:   | Train Device=CSX, Step=600, Loss=7.37500, Rate=4034.41 samples/sec, GlobalRate=4039.65 samples/sec\n2023-11-29 20:10:46,690 INFO:   | Train Device=CSX, Step=700, Loss=7.37500, Rate=4044.10 samples/sec, GlobalRate=4041.20 samples/sec\n2023-11-29 20:11:12,004 INFO:   | Train Device=CSX, Step=800, Loss=7.25000, Rate=4044.75 samples/sec, GlobalRate=4041.70 samples/sec\n2023-11-29 20:11:37,196 INFO:   | Train Device=CSX, Step=900, Loss=7.21875, Rate=4056.77 samples/sec, GlobalRate=4044.25 samples/sec\n2023-11-29 20:12:02,285 INFO:   | Train Device=CSX, Step=1000, Loss=7.12500, Rate=4071.60 samples/sec, GlobalRate=4047.95 samples/sec\n2023-11-29 20:12:02,286 INFO:   Saving checkpoint at step 1000\n2023-11-29 20:12:37,079 INFO:   Saved checkpoint model_dir_bert_large_pytorch/checkpoint_1000.mdl\n2023-11-29 20:13:25,683 INFO:   Heartbeat thread stopped for wsjob-gfi2baioyfduozkmgsc6a7.\n2023-11-29 20:13:25,691 INFO:   Training completed successfully!\n2023-11-29 20:13:25,691 INFO:   Processed 1024000 sample(s) in 336.373620536 seconds.\n
        "},{"location":"ai-testbed/cerebras/example-programs/#gpt-j-pytorch","title":"GPT-J PyTorch","text":"

        GPT-J [github] is an auto-regressive language model created by EleutherAI. This PyTorch GPT-J 6B parameter pretraining sample uses 2 CS2s.

        First, source a Cerebras PyTorch virtual environment and make sure that the requirements are installed:

        source ~/R_2.0.3/venv_cerebras_pt/bin/activate\npip install -r ~/R_2.0.3/modelzoo/requirements.txt\n

        Then

        cd ~/R_2.0.3/modelzoo/modelzoo/transformers/pytorch/gptj\ncp /software/cerebras/dataset/gptj/params_gptj_6B_sampleds.yaml configs/params_gptj_6B_sampleds.yaml\nexport MODEL_DIR=model_dir_gptj\nif [ -d \"$MODEL_DIR\" ]; then rm -Rf $MODEL_DIR; fi\npython run.py CSX --job_labels name=gptj_pt --params configs/params_gptj_6B_sampleds.yaml --num_csx=2 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.0.3/modelzoo/ --compile_dir $(whoami) |& tee mytest.log\n

        The last parts of the output should resemble the following:

        2023-11-29 20:59:19,223 INFO:   Beginning appliance run\n2023-11-29 21:03:53,875 INFO:   | Train Device=CSX, Step=100, Loss=8.43750, Rate=43.70 samples/sec, GlobalRate=43.70 samples/sec\n2023-11-29 21:08:28,779 INFO:   | Train Device=CSX, Step=200, Loss=8.12500, Rate=43.67 samples/sec, GlobalRate=43.67 samples/sec\n2023-11-29 21:08:28,781 INFO:   Saving checkpoint at step 200\n2023-11-29 21:13:56,695 INFO:   Saved checkpoint model_dir_gptj/checkpoint_200.mdl\n2023-11-29 21:14:30,135 INFO:   Heartbeat thread stopped for wsjob-kd4olqkhu6ya8qqzt88utd.\n2023-11-29 21:14:30,142 INFO:   Training completed successfully!\n2023-11-29 21:14:30,142 INFO:   Processed 24000 sample(s) in 910.883781998 seconds.\n
        "},{"location":"ai-testbed/cerebras/getting-started/","title":"Getting Started","text":""},{"location":"ai-testbed/cerebras/getting-started/#getting-started","title":"Getting Started","text":""},{"location":"ai-testbed/cerebras/getting-started/#connection-to-a-cs-2-node","title":"Connection to a CS-2 node","text":"

        Connection to one of the CS-2 cluster login nodes requires an MFA passcode for authentication - either an 8-digit passcode generated by an app on your mobile device (e.g. MobilePASS+) or a CRYPTOCard-generated passcode prefixed by a 4-digit pin. This is the same passcode used to authenticate into other ALCF systems, such as Theta and Cooley. In the examples below, replace ALCFUserID with your ALCF user id. To connect to a CS-2 login:

        1. ssh to a desired login node:
          ssh ALCFUserID@cer-login-01.ai.alcf.anl.gov\n
          or
          ssh ALCFUserID@cer-login-02.ai.alcf.anl.gov\n
          or
          ssh ALCFUserID@cer-login-03.ai.alcf.anl.gov\n
        2. Alternatively, ssh randomly to one of the above three login nodes:
          ssh ALCFUserID@cerebras.ai.alcf.anl.gov\n
        "},{"location":"ai-testbed/cerebras/job-queuing-and-submission/","title":"Job Queuing and Submission","text":"

        The CS-2 cluster has its own Kubernetes-based system for job submission and queuing.

        Jobs are started automatically through the Python framework in modelzoo.common.pytorch.run_utils Continuous job status for a job is output to stdout/stderr; redirect the output, or consider using a persistent session started with screen, or tmux, or both.

        Jobs that have not yet completed can be listed as shown. Note: this command can take over a minute to complete.

        (venv_cerebras_pt) $ csctl get jobs\nNAME                          AGE  DURATION  PHASE    SYSTEMS     USER     LABELS        DASHBOARD\nwsjob-thjj8zticwsylhppkbmjqe  13s  1s        RUNNING  cer-cs2-01  username name=unet_pt  https://grafana.cerebras1.lab.alcf.anl.gov/d/WebHNShVz/wsjob-dashboard?orgId=1&var-wsjob=wsjob-thjj8zticwsylhppkbmjqe&from=1691705374000&to=now\n(venv_cerebras_pt) $\n
        To view the grafana databoard for a job, follow the instructions at Grafana WsJob Dashboard for Cerebras jobs

        Jobs can be canceled as shown:

        (venv_cerebras_pt) $ csctl cancel job wsjob-eyjapwgnycahq9tus4w7id\nJob canceled successfully\n(venv_cerebras_pt) $\n

        Jobs can be labeled in the command line that launches them, if they are written with Cerebras's Python framework for running appliance jobs, by adding a command line option of this form:

         --job_labels labelname=labelvalue\n

        Jobs can also be labeled after they have been started as shown:

        (venv_cerebras_pt) $ csctl label job wsjob-ez6dyfronnsg2rz7f7fqw4 testlabel=test\njob/wsjob-ez6dyfronnsg2rz7f7fqw4 was patched\n(venv_cerebras_pt) $\n

        Jobs with a particular label/label value can be listed as shown:

        (venv_cerebras_pt) $ csctl get jobs | grep \"testlabel=test\"\nwsjob-ez6dyfronnsg2rz7f7fqw4  19m SUCCEEDED  cer-cs2-02 username testlabel=test,user=username\n(venv_cerebras_pt) $\n

        See csctl -h for more options. Add -h to a command for help for that command, e.g. csctl get -h or csctl cancel -h.

        $ csctl -h\nCerebras cluster command line tool.\n\nUsage:\n  csctl [command]\n\nAvailable Commands:\n  cancel             Cancel job\n  clear-worker-cache Clear the worker cache\n  config             View csctl config files\n  get                Get resources\n  label              Label resources\n  log-export         Gather and download logs.\n  types              Display resource types\n\nFlags:\n  -d, --debug int          higher debug values will display more fields in output objects\n  -h, --help               help for csctl\n      --namespace string   configure csctl to talk to different user namespaces\n  -v, --version            version for csctl\n\nUse \"csctl [command] --help\" for more information about a command.\n
        "},{"location":"ai-testbed/cerebras/miscellaneous/","title":"Miscellaneous","text":""},{"location":"ai-testbed/cerebras/miscellaneous/#porting-applications-to-the-cs-2","title":"Porting applications to the CS-2","text":"

        Cerebras documentation for porting code to run on a Cerebras CS-2 system: Ways to port your model

        "},{"location":"ai-testbed/cerebras/miscellaneous/#grafana-wsjob-dashboard-for-cerebras-jobs","title":"Grafana WsJob Dashboard for Cerebras jobs","text":"

        A Grafana dashboard provides support for visualizing, querying, and exploring the CS2 system\u2019s metrics and enables to access system logs and traces. See the Cerebras documentation for the Job Information Dashboard

        Here is a summary (tested to work on Ubuntu and MacOS)

        On your work machine with a web browser, e.g. your laptop, edit /etc/hosts, using your editor of choice

        sudo nano /etc/hosts\n
        Add this line
        127.0.0.1   grafana.cerebras1.lab.alcf.anl.gov\n
        Save, and exit the editor

        Download the Grafana certificate present on the Cerebras node at /opt/cerebras/certs/grafana_tls.crt to your local machine. To add this certificate to your browser keychain,

        1. On chrome, go to Settings->Privacy and security->Security->Manage device certificates
        2. Select System under \"System Keychains\" on the left hand side of your screen. Also select the \"Certificate\" tab.
        3. Drag and drop the downloaded certificate. Once it is added, it is visible as \"lab.alcf.anl.gov\"
        4. Select the certificate, and ensure that the \"Trust\" section is set to \"Always Trust\"

        On your work machine with a web browser, e.g. your laptop, tunnel the grafana https port on the cerebras grafana host through to localhost

        ssh -L 8443:grafana.cerebras1.lab.alcf.anl.gov:443 arnoldw@cer-login-03.ai.alcf.anl.gov\n

        Point a browser at grafana. (Tested with Firefox and Chrome/Brave) Open browser to a job grafana url shown in csctl get jobs, adding :8443 to hostname, e.g.

        https://grafana.cerebras1.lab.alcf.anl.gov:8443/d/WebHNShVz/wsjob-dashboard?orgId=1&var-wsjob=wsjob-49b7uuojdelvtrcxu3cwbw&from=1684859330000&to=noww\n

        Login to the dashboard with user admin, and password prom-operator

        "},{"location":"ai-testbed/cerebras/running-a-model-or-program/","title":"Running a Model/Program","text":""},{"location":"ai-testbed/cerebras/running-a-model-or-program/#getting-started","title":"Getting Started","text":""},{"location":"ai-testbed/cerebras/running-a-model-or-program/#job-submission-and-queuing","title":"Job submission and queuing","text":"

        Cerebras jobs are initiated and tracked automatically within the Python framework in modelzoo.common.pytorch.run_utils. This framework interacts with the Cerebras cluster management node.

        "},{"location":"ai-testbed/cerebras/running-a-model-or-program/#login-nodes","title":"Login nodes","text":"

        Jobs are launched from login nodes. If you expect a loss of an internet connection for any reason, for long-running jobs we suggest logging into a specific login node and using either screen or tmux to create persistent command line sessions. For details use:2

        man screen\n# or\nman tmux\n
        "},{"location":"ai-testbed/cerebras/running-a-model-or-program/#running-jobs-on-the-wafer","title":"Running jobs on the wafer","text":"

        Follow these instructions to compile and train the fc_mnist PyTorch sample. This models is a couple of fully connected layers plus dropout and RELU.

        "},{"location":"ai-testbed/cerebras/running-a-model-or-program/#cerebras-virtual-environments","title":"Cerebras virtual environments","text":"

        First, make a virtual environment for Cerebras for PyTorch. See Customizing Environments for the procedures for making PyTorch virtual environments for Cerebras. If an environment is made in ~/R_2.0.3/, it they would be activated as follows:

        source ~/R_2.0.3/venv_cerebras_pt/bin/activate\n

        "},{"location":"ai-testbed/cerebras/running-a-model-or-program/#clone-the-cerebras-modelzoo","title":"Clone the Cerebras modelzoo","text":"
        mkdir ~/R_2.0.3\ncd ~/R_2.0.3\ngit clone https://github.com/Cerebras/modelzoo.git\ncd modelzoo\ngit tag\ngit checkout Release_2.0.3\n
        "},{"location":"ai-testbed/cerebras/running-a-model-or-program/#running-a-pytorch-sample","title":"Running a Pytorch sample","text":""},{"location":"ai-testbed/cerebras/running-a-model-or-program/#activate-your-pytorch-virtual-environment-install-modelzoo-requirements-and-change-to-the-working-directory","title":"Activate your PyTorch virtual environment, install modelzoo requirements, and change to the working directory","text":"
        source ~/R_2.0.3/venv_cerebras_pt/bin/activate\npip install -r ~/R_2.0.3/modelzoo/requirements.txt\ncd ~/R_2.0.3/modelzoo/modelzoo/fc_mnist/pytorch\n

        Next, edit configs/params.yaml, making the following changes:

         train_input:\n-    data_dir: \"./mnist\"\n+    data_dir: \"/software/cerebras/dataset/fc_mnist/data/mnist/train\"\n

        and

         eval_input:\n-    data_dir: \"./mnist\"\n+    data_dir: \"/software/cerebras/dataset/fc_mnist/data/mnist/train\"\n

        If you want to have the sample download the dataset, you will need to specify absolute paths for the \"data_dir\"s.

        "},{"location":"ai-testbed/cerebras/running-a-model-or-program/#running-a-sample-pytorch-training-job","title":"Running a sample PyTorch training job","text":"

        To run the sample:

        export MODEL_DIR=model_dir\n# deletion of the model_dir is only needed if sample has been previously run\nif [ -d \"$MODEL_DIR\" ]; then rm -Rf $MODEL_DIR; fi\npython run.py CSX --job_labels name=pt_smoketest --params configs/params.yaml --num_csx=1 --mode train --model_dir $MODEL_DIR --mount_dirs /home/ /software --python_paths /home/$(whoami)/R_2.0.3/modelzoo --compile_dir /$(whoami) |& tee mytest.log\n

        A successful fc_mnist PyTorch training run should finish with output resembling the following:

        2023-11-29 18:13:13,048 INFO:   | Train Device=CSX, Step=1950, Loss=2.28834, Rate=397.31 samples/sec, GlobalRate=433.98 samples/sec\n2023-11-29 18:13:13,555 INFO:   | Train Device=CSX, Step=2000, Loss=2.34778, Rate=395.69 samples/sec, GlobalRate=431.83 samples/sec\n2023-11-29 18:13:13,555 INFO:   Saving checkpoint at step 2000\n2023-11-29 18:13:17,242 INFO:   Saved checkpoint model_dir/checkpoint_2000.mdl\n2023-11-29 18:13:55,517 INFO:   Heartbeat thread stopped for wsjob-fpwqt7maq8a5mxvblwwzbu.\n2023-11-29 18:13:55,523 INFO:   Training completed successfully!\n2023-11-29 18:13:55,523 INFO:   Processed 4000 sample(s) in 51.230697212 seconds.\n
        "},{"location":"ai-testbed/cerebras/system-overview/","title":"System Overview","text":"

        The Cerebras CS-2 is a wafer-scale deep learning accelerator comprising 850,000 processing cores, each providing 48KB of dedicated SRAM memory for an on-chip total of 40GB and interconnected to optimize bandwidth and latency. Its software platform integrates the popular machine learning framework PyTorch.

        The ALCF CS-2 systems are configured as a Cerebras Wafer-Scale Cluster, designed to support large-scale models (up to and well beyond 1 billion parameters) and large-scale inputs. The cluster contains two CS-2 systems and can distribute jobs across one or both CS-2 systems in a data-parallel framework. The supporting CPU cluster consists of MemoryX, SwarmX, management, and input worker nodes. The Cerebras Wafer-Scale cluster is run as an appliance: a user submits a job to the appliance, and the appliance manages preprocessing and streaming of the data, IO, and device orchestration within the appliance. It provides programming via PyTorch, with data-parallel distribution when using more than one CS-2. This installation supports both Pipelined execution for models up to 1 billion parameters and Weight Streaming execution for models up to and above 1 billion parameters.

        The public Cerebras documentation is available here.

        A typical Cerebras Wafer-Scale Cluster is shown in the figure. Users connect (ssh) to one of the three login nodes. Either ssh to cerebras.ai.alcf.anl.gov, which randomly resolves to one of cer-login-0[1-3].ai.alcf.anl.gov, or ssh to a specific node, cer-login-01.ai.alcf.anl.gov, cer-login-02.ai.alcf.anl.gov, cer-login-03.ai.alcf.anl.gov. The rest of the nodes in the cluster infrastructure are not directly accessible, except by admins. The trees /home, /projects, and /software are shared across all three login nodes, the relevant cluster infrastructure nodes, and all ALCF AI testbed platforms.

        CS-2 cluster figure

        (Figure from https://docs.cerebras.net/en/latest/wsc/cerebras-basics/how-cerebras-works.html)

        As indicated in the figures, the CS-2 nodes on the right are responsible only for running and accelerating the computations for training and predictions with the model. The other work, including compilation, is performed by input nodes, and by MemoryX nodes, which are used for weight storage and broadcast, and SwarmX nodes, which are used for gradient accumulation. Some model verification work can be done on login nodes.

        "},{"location":"ai-testbed/cerebras/tunneling-and-forwarding-ports/","title":"Tunneling and Forwarding Ports","text":"

        See ALCF's Jupyter Instructions, and Tunneling and forwarding ports. The Cerebras login nodes are direct login; tunneling and port forwarding do not involve jump hosts.

        "},{"location":"ai-testbed/data-management/data-management-overview/","title":"Data Management for the AI Testbed","text":""},{"location":"ai-testbed/data-management/data-management-overview/#home-file-system-space","title":"Home File System Space","text":"

        Users have a shared home filesystem /home shared across the ALCF AI testbed systems, including the login and compute nodes. Default user quota is 1 TB storage and 1,000,000 files. This space is backed up.

        "},{"location":"ai-testbed/data-management/data-management-overview/#project-file-system-space","title":"Project File System Space","text":"

        The team project/campaign file system /projects is intended to facilitate project collaboration and is accessible to the team members of your project that have an ALCF account. Default group storage quota is 2 TB and 2,000,000 files. Please note that this space isn't backed up. Our policy is that data will be purged from disk 6 months after project completion.

        "},{"location":"ai-testbed/data-management/data-management-overview/#data-transfer","title":"Data Transfer","text":"

        Users can transfer data to and from the AI testbed using Globus or tools such as scp or rsync.

        "},{"location":"ai-testbed/data-management/data-management-overview/#using-globus","title":"Using Globus","text":"

        We have a Globus endpoint each to move data to and from the /projects and /home filesystem respectively.

        • Use alcf#ai_testbed_projects for the /projects file system
        • Use alcf#ai_testbed_home for the /home files system

        Relevant information regarding using globus can be found here

        "},{"location":"ai-testbed/data-management/data-management-overview/#alcf-storage-policies","title":"ALCF Storage Policies","text":"

        ALCF data policies is available here

        Please Note: The basic level of protection provided is UNIX file level permissions; it is the user's responsibility to ensure that file permissions and umasks are set to match their needs.

        "},{"location":"ai-testbed/files/notes/","title":"Notes","text":"
        git submodule init; git submodule update\n
        "},{"location":"ai-testbed/files/todo/","title":"TODO","text":""},{"location":"ai-testbed/files/todo/#cosmictagger-v1x","title":"CosmicTagger v1.x","text":"

        Note: Conversion of CT to the various machines is meant to be a tutorial as to how to convert a model.

        "},{"location":"ai-testbed/files/todo/#cerebras-ct","title":"Cerebras CT","text":"

        Cerebras cannot support CT and UNets in general as of 4/25/23.

        "},{"location":"ai-testbed/files/todo/#graphcore-ct","title":"Graphcore CT","text":"

        Alex has been very busy with conferences, etc.

        He ran CT but, it ran on the CPU. He has stated that it may need to be completely written using, I can't remember which, Poplar or PopArt. If that is necessary, Venkat should make the call.

        "},{"location":"ai-testbed/files/todo/#groq-ct","title":"Groq CT","text":""},{"location":"ai-testbed/files/todo/#habana-ct","title":"Habana CT","text":"

        Repo: https://github.com/argonne-lcf/user-guides.git Branch: feature/Habana002-DNP File: docs/ai-testbed/habana/CosmicTagger-Conversion.md

        "},{"location":"ai-testbed/files/todo/#sambanova-ct","title":"SambaNova CT","text":"

        SN has a highly-engineered version of CT.

        They are working to support CT OOB, Out-Of-Box.

        "},{"location":"ai-testbed/files/todo/#cerebras","title":"Cerebras","text":"

        Repo: https://github.com/argonne-lcf/user-guides.git Branch: Talk to Bill.

        "},{"location":"ai-testbed/files/todo/#graphcore","title":"Graphcore","text":"

        Repo: https://github.com/argonne-lcf/user-guides.git

        When you change back to 3.2, use virtual-environments.md from the commit a4ce3b5598f4d6feee7ca58accde1a6a0ea84244 \"virtual-environments.md with 3.2 edits.\"

        "},{"location":"ai-testbed/files/todo/#groq","title":"Groq","text":"

        Repo: https://github.com/argonne-lcf/user-guides.git Branch: feature/Groq001-DNP

        "},{"location":"ai-testbed/files/todo/#habana","title":"Habana","text":"

        Repo: https://github.com/argonne-lcf/user-guides.git Branch: feature/Habana002-DNP

        "},{"location":"ai-testbed/files/todo/#sambanova","title":"SambaNova","text":"

        Repo: https://github.com/argonne-lcf/user-guides.git

        "},{"location":"ai-testbed/graphcore/documentation/","title":"Documentation links","text":"

        Poplar SDK PyTorch for the IPU: User Guide Targetting the IPU from Tensorflow 2 IPU programming guide Examples Examples Github Repo POD systems POD64 specs

        "},{"location":"ai-testbed/graphcore/example-programs/","title":"Example Programs","text":"

        Graphcore provides examples of some well-known AI applications in their repository at https://github.com/graphcore/examples.git. Clone the examples repository to your personal directory structure:

        mkdir ~/graphcore\ncd ~/graphcore\ngit clone https://github.com/graphcore/examples.git\n

        "},{"location":"ai-testbed/graphcore/example-programs/#mnist-poptorch","title":"MNIST - PopTorch","text":""},{"location":"ai-testbed/graphcore/example-programs/#activate-poptorch-environment","title":"Activate PopTorch Environment","text":"
        source ~/venvs/graphcore/poptorch33_env/bin/activate\n
        "},{"location":"ai-testbed/graphcore/example-programs/#install-requirements","title":"Install Requirements","text":"

        Change directory:

        cd ~/graphcore/examples/tutorials/simple_applications/pytorch/mnist\n

        "},{"location":"ai-testbed/graphcore/example-programs/#run-mnist","title":"Run MNIST","text":"

        Execute the command:

        /opt/slurm/bin/srun --ipus=1 python mnist_poptorch.py\n

        "},{"location":"ai-testbed/graphcore/example-programs/#output","title":"Output","text":"

        The expected output will resemble the following:

        srun: job 10671 queued and waiting for resources\nsrun: job 10671 has been allocated resources\nTrainingModelWithLoss(\n  (model): Network(\n    (layer1): Block(\n      (conv): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))\n      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (relu): ReLU()\n    )\n    (layer2): Block(\n      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))\n      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (relu): ReLU()\n    )\n    (layer3): Linear(in_features=1600, out_features=128, bias=True)\n    (layer3_act): ReLU()\n    (layer3_dropout): Dropout(p=0.5, inplace=False)\n    (layer4): Linear(in_features=128, out_features=10, bias=True)\n    (softmax): Softmax(dim=1)\n  )\n  (loss): CrossEntropyLoss()\n)\nEpochs:   0%|          | 0/10 [00:00<?,[23:27:06.753] [poptorch:cpp] [warning] [DISPATCHER] Type coerced from Long to Int for tensor id 10\nGraph compilation: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 100/100 [00:00<00:00]\nEpochs: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 10/10 [01:17<00:00,  7.71s/it]\nGraph compilation: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 100/100 [00:00<00:00]                          \nAccuracy on test set: 96.85%\u2588\u2588\u2588\u2588\u2588\u2588| 100/100 [00:00<00:00]\n
        "},{"location":"ai-testbed/graphcore/example-programs/#mnist-tensorflow2","title":"MNIST - Tensorflow2","text":""},{"location":"ai-testbed/graphcore/example-programs/#activate-tensorflow2-environment","title":"Activate Tensorflow2 Environment","text":"

        Create a TensorFlow2 environment as explained in the tensorflow-2-environment-setup and activate the same.

        source ~/venvs/graphcore/tensorflow2_33_env/bin/activate\n

        "},{"location":"ai-testbed/graphcore/example-programs/#install-requirements_1","title":"Install Requirements","text":"

        Change directory:

        cd ~/graphcore/examples/tutorials/simple_applications/tensorflow2/mnist/\n

        "},{"location":"ai-testbed/graphcore/example-programs/#run-mnist-tensorflow","title":"Run MNIST - TensorFlow","text":"

        Execute the command:

        /opt/slurm/bin/srun --ipus=1 python mnist.py\n
        "},{"location":"ai-testbed/graphcore/example-programs/#output_1","title":"Output","text":"

        The expected output will resemble the following:

        srun: job 10672 queued and waiting for resources\nsrun: job 10672 has been allocated resources\n2023-08-22 23:35:02.925033: I tensorflow/compiler/plugin/poplar/driver/poplar_platform.cc:43] Poplar version: 3.3.0 (de1f8de2a7) Poplar package: b67b751185\n2023-08-22 23:35:06.119772: I tensorflow/compiler/plugin/poplar/driver/poplar_executor.cc:1619] TensorFlow device /device:IPU:0 attached to 1 IPU with Poplar device ID: 0\n2023-08-22 23:35:07.087287: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)\n2023-08-22 23:35:07.351132: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:210] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.\n2023-08-22T23:35:09.469066Z PL:POPOPS    3545299.3545299 W: createOutputForElementWiseOp 'while/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits/fusion.3/Op/Equal/Out' ({32,10}): No suitable input found, creating new variable with linear tile mapping\n2023-08-22 23:35:18.532415: I tensorflow/compiler/jit/xla_compilation_cache.cc:376] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.\nEpoch 1/4\n2000/2000 [==============================] - 13s 6ms/step - loss: 0.6220\nEpoch 2/4\n2000/2000 [==============================] - 1s 262us/step - loss: 0.3265\nEpoch 3/4\n2000/2000 [==============================] - 1s 273us/step - loss: 0.2781\nEpoch 4/4\n2000/2000 [==============================] - 1s 289us/step - loss: 0.2482\n
        "},{"location":"ai-testbed/graphcore/example-programs/#resnet50","title":"ResNet50","text":""},{"location":"ai-testbed/graphcore/example-programs/#activate-poptorch-environment_1","title":"Activate PopTorch Environment","text":"

        Create and activate a fresh PopTorch environment poptorch33_resnet50_env as outlined in the virtual environment section, then activate it.

        source ~/venvs/graphcore/poptorch33_resnet50_env/bin/activate\n

        "},{"location":"ai-testbed/graphcore/example-programs/#install-requirements_2","title":"Install Requirements","text":"

        Change directory

        cd ~/graphcore/examples/vision/cnns/pytorch\nmake install \nmake install-turbojpeg\n

        "},{"location":"ai-testbed/graphcore/example-programs/#update-configsyml","title":"Update configs.yml","text":"

        Change directory:

        cd ~/graphcore/examples/vision/cnns/pytorch/train\n
        Open configs.yml with your favorite editor. Find in the resnet50 section
        use_bbox_info: true\n
        and change it to:
        use_bbox_info: false\n

        "},{"location":"ai-testbed/graphcore/example-programs/#run-resnet50","title":"Run ResNet50","text":"

        The scripts to train a ResNet50 PyTorch model on Pod4 is located at https://github.com/graphcore/examples/tree/master/vision/cnns/pytorch/train

        Set the following environmental variables.

        mkdir -p ~/graphcore/tmp/pt_cache/\nexport PYTORCH_CACHE_DIR=~/graphcore/tmp/pt_cache/\n
        To run 4 replicas (a total for 4 IPUs) of the ResNet50 model: Make a script with the following contents, called poprun_unet.sh This script tells poprun to use the partition id of the partition created for the slurm job used to run the script.
        #!/bin/bash\npoprun -vv --vipu-partition=slurm_${SLURM_JOBID} --num-instances=1 --num-replicas=4 --executable-cache-path=$PYTORCH_CACHE_DIR python3 /home/$USER/graphcore/examples/vision/cnns/pytorch/train/train.py --config resnet50-pod4 --imagenet-data-path /mnt/localdata/datasets/imagenet-raw-dataset --epoch 2 --validation-mode none --dataloader-worker 14 --dataloader-rebatch-size 256\n
        Then
        chmod +x poprun_unet.sh\n/opt/slurm/bin/srun --ipus=4 poprun_unet.sh\n

        This model is run with the imagenet dataset.

        "},{"location":"ai-testbed/graphcore/example-programs/#output_2","title":"Output","text":"

        The expected output starts with this:

        srun: job 10675 queued and waiting for resources\nsrun: job 10675 has been allocated resources\n23:48:29.160 3555537 POPRUN [I] V-IPU server address picked up from 'vipu': 10.1.3.101:8090\n23:48:29.160 3555537 POPRUN [D] Connecting to 10.1.3.101:8090\n23:48:29.162 3555537 POPRUN [D] Status for partition slurm_10673: OK (error 0)\n23:48:29.162 3555537 POPRUN [I] Partition slurm_10673 already exists and is in state: PS_ACTIVE\n23:48:29.163 3555537 POPRUN [D] The reconfigurable partition slurm_10673 is OK\n ===========================\n|      poprun topology      |\n|===========================|\n| hosts     | gc-poplar-02  |\n|-----------|---------------|\n| ILDs      |       0       |\n|-----------|---------------|\n| instances |       0       |\n|-----------|---------------|\n| replicas  | 0 | 1 | 2 | 3 |\n ---------------------------\n23:48:29.163 3555537 POPRUN [D] Target options from environment: {}\n23:48:29.163 3555537 POPRUN [D] Target options from V-IPU partition: {\"ipuLinkDomainSize\":\"4\",\"ipuLinkConfiguration\":\"default\",\"ipuLinkTopology\":\"mesh\",\"gatewayMode\":\"true\",\"instanceSize\":\"4\"}\n23:48:29.207 3555537 POPRUN [D] Found 1 devices with 4 IPUs\n23:48:29.777 3555537 POPRUN [D] Attached to device 6\n23:48:29.777 3555537 POPRUN [I] Preparing parent device 6\n23:48:29.777 3555537 POPRUN [D] Device 6 ipuLinkDomainSize=64, ipuLinkConfiguration=Default, ipuLinkTopology=Mesh, gatewayMode=true, instanceSize=4\n23:48:33.631 3555537 POPRUN [D] Target options from Poplar device: {\"ipuLinkDomainSize\":\"64\",\"ipuLinkConfiguration\":\"default\",\"ipuLinkTopology\":\"mesh\",\"gatewayMode\":\"true\",\"instanceSize\":\"4\"}\n23:48:33.631 3555537 POPRUN [D] Using target options: {\"ipuLinkDomainSize\":\"4\",\"ipuLinkConfiguration\":\"default\",\"ipuLinkTopology\":\"mesh\",\"gatewayMode\":\"true\",\"instanceSize\":\"4\"}\n
        Expected output ends with this:
        Graph compilation: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 100/100 [00:04<00:00][1,0]<stderr>:2023-08-22T23:49:40.103248Z PO:ENGINE   3556102.3556102 W: WARNING: The compile time engine option debug.branchRecordTile is set to \"5887\" when creating the Engine. (At compile time it was set to 1471)\n[1,0]<stderr>:\nLoss:6.7539 [1,0]<stdout>:[INFO] Epoch 1\u2588\u2588\u2588\u2588\u258c| 75/78 [02:42<00:06,  2.05s/it][1,0]<stderr>:\n[1,0]<stdout>:[INFO] loss: 6.7462,\n[1,0]<stdout>:[INFO] accuracy: 0.62 %\n[1,0]<stdout>:[INFO] throughput: 7599.7 samples/sec\n[1,0]<stdout>:[INFO] Epoch 2/2\nLoss:6.7462 | Accuracy:0.62%: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 78/78 [02:48<00:00,  2.16s/it][1,0]<stderr>:\nLoss:6.2821 | Accuracy:2.42%:  96%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u258c| 75/7[1,0]<stdout>:[INFO] Epoch 2,0]<stderr>:\n[1,0]<stdout>:[INFO] loss: 6.2720,\n[1,0]<stdout>:[INFO] accuracy: 2.48 %\n[1,0]<stdout>:[INFO] throughput: 8125.8 samples/sec\n[1,0]<stdout>:[INFO] Finished training. Time: 2023-08-22 23:54:57.853508. It took: 0:05:26.090631\nLoss:6.2720 | Accuracy:2.48%: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 78/78 [02:37<00:00,  2.02s/it][1,0]<stderr>:\n[1,0]<stderr>:/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown\n[1,0]<stderr>:  warnings.warn('resource_tracker: There appear to be %d '\n23:55:02.722 3555537 POPRUN [I] mpirun (PID 3556098) terminated with exit code 0\n

        "},{"location":"ai-testbed/graphcore/example-programs/#gpt-2-pytorch-pod16-run","title":"GPT-2 PyTorch - POD16 run","text":"

        The scripts to train a GPT-2 pytorch model on the POD16 are located at https://github.com/graphcore/examples/tree/master/nlp/gpt2/pytorch

        In order to run the GPT-2 Pytorch model, create a new popTorch virtual environment poptorch33_gpt2 as described in the virtual environment section and activate it.

        source ~/venvs/graphcore/poptorch33_gpt2/bin/activate\n
        "},{"location":"ai-testbed/graphcore/example-programs/#install-requirements_3","title":"Install Requirements","text":"

        Change directory:

        cd ~/graphcore/examples/nlp/gpt2/pytorch\npip3 install -r requirements.txt\n

        "},{"location":"ai-testbed/graphcore/example-programs/#run-gpt2-on-16-ipus","title":"Run GPT2 on 16 IPUs","text":"

        The command for the GPT2 model is as follows is as follows.

        /opt/slurm/bin/srun --ipus=16 python /home/$USER/graphcore/examples/nlp/gpt2/pytorch/train_gpt2.py --model gpt2 --ipus-per-replica 4 --replication-factor 4 --gradient-accumulation 2048 --device-iterations 8 --batch-size 1 --layers-per-ipu 0 4 4 4 --matmul-proportion 0.15 0.15 0.15 0.15 --max-len 1024 --optimizer AdamW --learning-rate 0.00015 --lr-schedule cosine --lr-warmup 0.01 --remap-logit True --enable-sequence-serialized True --embedding-serialization-factor 4 --recompute-checkpoint-every-layer True --enable-half-partials True --replicated-tensor-sharding True --dataset 'generated' --epochs 1\n
        It runs a gpt2 model that fits on 4 IPUS indicated by --ipus-per-replica. The --replication-factor indicates how many times the model is replicated in a data parallel manner (4 in the above example). Hence the total number of IPUs used in this example is 16.

        The effective global batch size in this example is (micro)batch-size * gradient-accumulation * replication-factor = 1 x 2048 x 4 = 8192. The device iterations indicates the total number samples loaded in 1 training step = global batch size * device iterations = 8192*8 = 65536. To learn more about these parameters and in general batching of IPUs refer IPU batching .

        The above example is running with generated or synthetic data. To use the same example with a real world dataset, refer to data setup.

        "},{"location":"ai-testbed/graphcore/example-programs/#output_3","title":"Output","text":"

        Expected output starts with the following:

        srun: job 10697 queued and waiting for resources\nsrun: job 10697 has been allocated resources\nBuilding (if necessary) and loading remap_tensor_ce.\nFailed to find compiled extension; rebuilding.\nBuilding (if necessary) and loading residual_add_inplace_pattern.\nModel initializing\n-------------------- Device Allocation --------------------\nEmbedding  --> IPU 0\nLayer 0  --> IPU 1\nLayer 1  --> IPU 1\nLayer 2  --> IPU 1\nLayer 3  --> IPU 1\nLayer 4  --> IPU 2\nLayer 5  --> IPU 2\nLayer 6  --> IPU 2\nLayer 7  --> IPU 2\nLayer 8  --> IPU 3\nLayer 9  --> IPU 3\nLayer 10 --> IPU 3\nLayer 11 --> IPU 3\nLM_head --> IPU 0\n
        Expected output ends with the following:
        step 0 of epoch 0, loss: 10.913220405578613, acc: 2.0071864128112793e-05, lr: 0.00012803300858899104, throughput: 646.8439205981404 samples/sec\nstep 1 of epoch 0, loss: 10.836345672607422, acc: 1.9788742065429688e-05, lr: 7.5e-05, throughput: 1058.0979097185766 samples/sec\nstep 2 of epoch 0, loss: 10.831247329711914, acc: 2.0518898963928223e-05, lr: 2.1966991411008938e-05, throughput: 1058.7595523807183 samples/sec\nstep 3 of epoch 0, loss: 10.829034805297852, acc: 1.990795135498047e-05, lr: 0.0, throughput: 1059.6762623043378 samples/sec\n

        Note: The graph compilation for a large model like GPT-2 takes about half an hour.

        "},{"location":"ai-testbed/graphcore/getting-started/","title":"Getting Started","text":"

        Connection to a Graphcore node is a two-step process.

        The first step is to ssh from a local machine to the login node.

        The second step is to log in to a Graphcore node from the login node.

        "},{"location":"ai-testbed/graphcore/getting-started/#log-in-to-login-node","title":"Log in to Login Node","text":"

        Login to the Graphcore login node from your local machine using the below command. This uses the ALCF account ID that uses the password generated from the MobilePASS+.

        Note: In the examples below, replace ALCFUserID with your ALCF user id.

        ssh ALCFUserID@gc-login-01.ai.alcf.anl.gov\n# or\nssh ALCFUserID@gc-login-02.ai.alcf.anl.gov\n

        Note: Use the ssh \"-v\" option in order to debug any ssh problems.

        "},{"location":"ai-testbed/graphcore/getting-started/#log-in-to-a-graphcore-node","title":"Log in to a Graphcore Node","text":"

        Once you are on the login node, ssh to one of the Graphcore nodes.

        ssh gc-poplar-02.ai.alcf.anl.gov\n# or\nssh gc-poplar-03.ai.alcf.anl.gov\n# or\nssh gc-poplar-04.ai.alcf.anl.gov\n

        **Note: ssh gc-poplar-01.ai.alcf.anl.gov is not accessible to users. However, its IPU resources are assigned by the slurm tasks.

        "},{"location":"ai-testbed/graphcore/job-queuing-and-submission/","title":"Job Queueing and Submission","text":""},{"location":"ai-testbed/graphcore/job-queuing-and-submission/#introduction","title":"Introduction","text":"

        ALCF's Graphcore POD64 system uses Slurm for job submission and queueing. Below are some of the important commands for using Slurm. For more information refer to Slurm Documentation.

        NOTE: Jobs that require IPUs will fail unless launched with srun or sbatch. NOTE: There is a single Slurm scheduler for the Graphcore POD64.

        "},{"location":"ai-testbed/graphcore/job-queuing-and-submission/#srun","title":"SRun","text":"

        The Slurm command srun can be used to run individual Python scripts (or other programs) in parallel with other scripts on a cluster managed by Slurm. An example of srun usage is shown below. Use the --ipus= option to specify the number of IPUs required for the run.

        Example:

        srun --ipus=1 python mnist_poptorch.py\n
        "},{"location":"ai-testbed/graphcore/job-queuing-and-submission/#sbatch","title":"SBatch","text":"

        Alternatively, these jobs can be submitted to the Slurm workload manager through a batch script by using the sbatch command. To do this, create a bash script (submit-mnist-poptorch-job.sh here as an example) with the commands that you want to execute.

        #!/bin/sh\n\npython mnist_poptorch.py\n

        Then pass the bash script as an input to the sbatch command as shown below, requesting the number of IPUs required:

        sbatch --ipus=1 --output=mnist-poptorch-output.log submit-mnist-poptorch-job.sh\n
        "},{"location":"ai-testbed/graphcore/job-queuing-and-submission/#squeue","title":"SQueue","text":"

        The squeue command provides information about jobs located in the Slurm scheduling queue.

        $ squeue\n             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)\n              2572       p64 Graphcor username  R       1:12      1 gc-poplar-02\n
        "},{"location":"ai-testbed/graphcore/job-queuing-and-submission/#sinfo","title":"SInfo","text":"

        SInfo is used to view partition and node information for a system running Slurm.

        $ sinfo\nPARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST\np64*         up   infinite      3   idle gc-poplar-[02-04]\n

        For more information, see SInfo.

        "},{"location":"ai-testbed/graphcore/job-queuing-and-submission/#scancel","title":"SCancel","text":"

        SCancel is used to signal or cancel jobs, job arrays, or job steps.

        scancel job_id\n
        "},{"location":"ai-testbed/graphcore/miscellaneous/","title":"Miscellaneous","text":""},{"location":"ai-testbed/graphcore/miscellaneous/#status","title":"Status","text":""},{"location":"ai-testbed/graphcore/miscellaneous/#gc-monitor","title":"GC-Monitor","text":"

        The command gc-monitor is Graphcore's device usage monitor. Run it as follows for ordinary monitoring. See gc-monitor --help for other options.

        export IPUOF_VIPU_API_HOST=10.1.3.101\ngc-monitor --no-card-info --all-partitions\n# or watch gc-monitor --no-card-info --all-partitions\n
        The IPUOF_VIPU_API_HOST environment variable can conflict with the running of poptorch programs. The graphcore nodes have a convenience script that temporarily sets this environment variable.
        wrapped_gc_monitor.sh --no-card-info --all-partitions\n

        Note: if there are no partitions active, gc-monitor will core dump: Segmentation fault (core dumped)

        The output will look something like:

        +--------------------------------------------------------------+-----------------------+\n|      IPUs in slurm_2616 attached from other namespaces       |         Board         |\n+----+------------------------------+--------------+-----------+-----------+-----------+\n| ID |       Application host       |    Clock     |   Temp    |   Temp    |   Power   |\n+----+------------------------------+--------------+-----------+-----------+-----------+\n| 0  |         gc-poplar-02         |   1850MHz    |  24.2 C   |  21.1 C   |  92.3 W   |\n+----+------------------------------+--------------+-----------+-----------+-----------+\n
        "},{"location":"ai-testbed/graphcore/miscellaneous/#gc-info","title":"GC-Info","text":"

        The command gc-info is used to display device information. See gc-info --help for more options.

        To list devices,

        gc-info -l\n

        The command gc-info lists the partition and different IPU Id's along with the multi-IPU configuration IDs.

        -+- Id:  [0], target: [Fabric], IPU-M host:  [10.1.5.1], IPU#: [3]\n-+- Id:  [1], target: [Fabric], IPU-M host:  [10.1.5.1], IPU#: [2]\n-+- Id:  [2], target: [Fabric], IPU-M host:  [10.1.5.1], IPU#: [1]\n

        One may also display detailed information for a specific device. The devices are numbered 0-63. For example,

        gc-info --device-id 0 --device-info\n

        See gc-info --help for more information.

        "},{"location":"ai-testbed/graphcore/miscellaneous/#how-busy-is-the-system","title":"How busy is the system?","text":"

        Use one of

        top\nhtop\n
        "},{"location":"ai-testbed/graphcore/running-a-model-or-program/","title":"Steps to Run a Model/Program","text":"

        Note: Please be mindful of how you are using the system. For example, consider running larger jobs in the evening or on weekends.

        Running of any model or application includes graph compilation of the model that is then deployed on the IPUs. Below is the description of training a neural network for classification on the MNIST dataset using the PopTorch (pytorch framework optimized for IPU).

        "},{"location":"ai-testbed/graphcore/running-a-model-or-program/#examples-repo","title":"Examples Repo","text":"

        Graphcore provides examples of some well-known AI applications in their repository at https://github.com/graphcore/examples.git.

        Clone the examples repository to your personal directory structure, and checkout the v3.3.0 release:

        mkdir ~/graphcore\ncd ~/graphcore\ngit clone https://github.com/graphcore/examples.git\ncd examples\n
        "},{"location":"ai-testbed/graphcore/running-a-model-or-program/#mnist","title":"MNIST","text":""},{"location":"ai-testbed/graphcore/running-a-model-or-program/#activate-poptorch-environment","title":"Activate PopTorch Environment","text":"

        Follows the steps at Poptorch environment setup to enable the Poplar SDK.

        source ~/venvs/graphcore/poptorch33_env/bin/activate\n
        "},{"location":"ai-testbed/graphcore/running-a-model-or-program/#install-requirements","title":"Install Requirements","text":"

        Change directory and install packages specific to the MNIST model:

        cd ~/graphcore/examples/tutorials/simple_applications/pytorch/mnist\n
        "},{"location":"ai-testbed/graphcore/running-a-model-or-program/#run-mnist","title":"Run MNIST","text":"

        Execute the command:

        /opt/slurm/bin/srun --ipus=1 python mnist_poptorch.py\n

        All models are run using Slurm, with the --ipus indicating how many IPUs are need to be allocated for the model being run. This example uses a batchsize of 8, and run for 10 epochs. It also set the device iteration to 50 which is the number of iterations the device should run over the data before returning to the user. The dataset used in the example is derived from the TorchVision and the PopTorch dataloader is used to load the data required for the 50 device iterations from the host to the device in a single step.

        The model used here is a simple CNN based model with an output from a classifier (softmax layer). A simple Pytorch model is translated to a PopTorch model using poptorch.Options(). poptorch.trainingModel is the model wrapping function on the Pytorch model. The first call to trainingModel will compile the model for the IPU. You can observe the compilation process as part of output of the above command.

        Graph compilation:   3%|\u258e         | 3/100 [00:00<00:03]2023-04-26T16:53:21.225944Z PL:POPLIN    3680893.3680893 W: poplin::preplanMatMuls() is deprecated! Use poplin::preplan() instead\nGraph compilation: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 100/100 [00:20<00:00]2023-04-26T16:53:38.241395Z popart:session 3680893.3680893\n

        The artifacts from the graph compilations is cached in the location set by the flag POPTORCH_CACHE_DIR, where the .popef file corresponding to the model under consideration is cached.

        "},{"location":"ai-testbed/graphcore/running-a-model-or-program/#output","title":"Output","text":"

        The expected output will start with downloads followed by and we can observe the model used by the model, the progress bar of the compilation process, and the training progress bar.

        srun: job 10671 queued and waiting for resources\nsrun: job 10671 has been allocated resources\nTrainingModelWithLoss(\n  (model): Network(\n    (layer1): Block(\n      (conv): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))\n      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (relu): ReLU()\n    )\n    (layer2): Block(\n      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))\n      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n      (relu): ReLU()\n    )\n    (layer3): Linear(in_features=1600, out_features=128, bias=True)\n    (layer3_act): ReLU()\n    (layer3_dropout): Dropout(p=0.5, inplace=False)\n    (layer4): Linear(in_features=128, out_features=10, bias=True)\n    (softmax): Softmax(dim=1)\n  )\n  (loss): CrossEntropyLoss()\n)\nEpochs:   0%|          | 0/10 [00:00<?,[23:27:06.753] [poptorch:cpp] [warning] [DISPATCHER] Type coerced from Long to Int for tensor id 10\nGraph compilation: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 100/100 [00:00<00:00]\nEpochs: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 10/10 [01:17<00:00,  7.71s/it]\nGraph compilation: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 100/100 [00:00<00:00]                          \nAccuracy on test set: 96.85%\u2588\u2588\u2588\u2588\u2588\u2588| 100/100 [00:00<00:00]\n

        Refer to the script to learn more about this example.

        Example Programs lists the different example applications with corresponding commands for each of the above steps.

        "},{"location":"ai-testbed/graphcore/system-overview/","title":"System Overview","text":"

        The Graphcore Bow-Pod64 system is the latest-generation AI accelerator from Graphcore. This is a one-rack system consisting of 64 Bow-class Intelligence Processing Units (IPU) with a custom interconnect. The system provides for an aggregate 22 Petaflops/s of performance in half precision. It has a total of 57.6 GB In-Processor-Memory with a total of 94,208 IPU cores. The system consists of four servers for data-processing.

        For more details refer to the POD64 spec

        (Figure from https://www.graphcore.ai/products/poplar)

        The Graphcore software stack includes support for TensorFlow and PyTorch using the Poplar SDK. The Poplar\u00ae SDK is t is the toolchain specifically designed for creating graph software for ML applications. It integrates with the traditional ML frameworks like PyTorch and TensorFlow allowing users to port their existing code to the IPU hardware-specific code. The various components of the poplar SDK stack are shown in the figure. It includes the PopTorch framework which is a wrapper over the PyTorch framework optimized to the IPU hardware. It also enlists the different PopLibs libraries supported, which enables to construct graphs, define tensor data and control how the code and data are mapped onto the IPU for execution.

        "},{"location":"ai-testbed/graphcore/virtual-environments/","title":"Virtual Environments","text":""},{"location":"ai-testbed/graphcore/virtual-environments/#poplar-sdk-setup","title":"Poplar SDK Setup","text":"

        The Poplar SDK is downloaded onto the graphcore systems at the /software/graphcore/poplar_sdk/ location. The default poplar version (3.3.0) is enabled automatically upon logging into a graphcore node.

        Check if Poplar is setup correctly:

        popc --version\n

        One should see:

        POPLAR version 3.3.0 (de1f8de2a7)\nclang version 16.0.0 (2fce0648f3c328b23a6cbc664fc0dd0630122212)\n

        If the Poplar SDK is not enabled, it can be enabled with

        source /software/graphcore/poplar_sdk/3.3.0/enable\n

        To disable the current Poplar SDK, e.g. if one wants to use a different Poplar SDK, follow the steps below. (Otherwise, skip to section Miscellaneous Environment Variables.) This example assumes that the current installed SDK is 3.1.0 and you want to move to 3.3.0

        1. Check the current version
           $ popc --version\n POPLAR version 3.1.0 (e12d5f9f01)\n clang version 15.0.0 (bab932b4fc4cdb58bb009370384b2c41579bd9d9)\n
        2. Unset the current version
          unset POPLAR_SDK_ENABLED\n
        3. Enable poplar and popart
          source /software/graphcore/poplar_sdk/3.3.0/poplar-ubuntu_20_04-3.3.0+7857-b67b751185/enable.sh \nsource /software/graphcore/poplar_sdk/3.3.0/popart-ubuntu_20_04-3.3.0+7857-b67b751185/enable.sh \n
        4. Recheck for the new version.
          $popc --version\nPOPLAR version 3.3.0 (de1f8de2a7)\nclang version 16.0.0 (2fce0648f3c328b23a6cbc664fc0dd0630122212)\n
        5. Set SDK env variable

          POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.3.0/\nexport POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT\n

        6. Create a new virtual environment with this SDK and install popTorch and or other frameworks as needed.

          virtualenv ~/Graphcore/workspace/poptorch33_env\nsource ~/Graphcore/workspace/poptorch33_env/bin/activate\npip install $POPLAR_SDK_ROOT/poptorch-3.3.0+113432_960e9c294b_ubuntu_20_04-cp38-cp38-linux_x86_64.whl\nexport PYTHONPATH=$POPLAR_SDK_ROOT/python:$PYTHONPATH\n

        "},{"location":"ai-testbed/graphcore/virtual-environments/#miscellaneous-environment-variables","title":"Miscellaneous Environment Variables","text":"
        mkdir ~/tmp\nexport TF_POPLAR_FLAGS=--executable_cache_path=~/tmp\nexport POPTORCH_CACHE_DIR=~/tmp\n\nexport POPART_LOG_LEVEL=WARN\nexport POPLAR_LOG_LEVEL=WARN\nexport POPLIBS_LOG_LEVEL=WARN\n\nexport PYTHONPATH=/software/graphcore/poplar_sdk/3.3.0/poplar-ubuntu_20_04-3.3.0+7857-b67b751185/python:$PYTHONPATH\n
        "},{"location":"ai-testbed/graphcore/virtual-environments/#poptorch-environment-setup","title":"PopTorch Environment Setup","text":"

        PopTorch is an extension of the Pytorch framework that is optimized for the IPU specific functionality. To activate the PopTorch environment, first create a virtual environment and activate it.

        mkdir -p ~/venvs/graphcore\nvirtualenv ~/venvs/graphcore/poptorch33_env\nsource ~/venvs/graphcore/poptorch33_env/bin/activate\n

        Use the following commands to install the PopTorch environment.

        POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.3.0\nexport POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT\npip install $POPLAR_SDK_ROOT/poptorch-3.3.0+113432_960e9c294b_ubuntu_20_04-cp38-cp38-linux_x86_64.whl\n
        "},{"location":"ai-testbed/graphcore/virtual-environments/#tensorflow-2-environment-setup","title":"TensorFlow 2 Environment Setup","text":"

        The Poplar SDK provides TensorFlow and Keras wheels built on 2.6 that includes the IPU specific functionality and optimized for the AMD processors. It can be installed as follows.

        Create virtual environment.

        virtualenv ~/venvs/graphcore/tensorflow2_33_env\nsource ~/venvs/graphcore/tensorflow2_33_env/bin/activate\n

        Install the TensorFlow and Keras wheels.

        POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.3.0\nexport POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT\npip install $POPLAR_SDK_ROOT/tensorflow-2.6.3+gc3.3.0+251580+08d96978c7f+amd_znver1-cp38-cp38-linux_x86_64.whl\npip install $POPLAR_SDK_ROOT/keras-2.6.0+gc3.3.0+251582+a3785372-py2.py3-none-any.whl\n
        "},{"location":"ai-testbed/graphcore/virtual-environments/#verify-installation","title":"Verify Installation","text":"
        python -c \"from tensorflow.python import ipu\"\n

        You should see:

        2023-08-22 21:53:26.109934: I tensorflow/compiler/plugin/poplar/driver/poplar_platform.cc:43] Poplar version: 3.3.0 (de1f8de2a7) Poplar package: b67b751185\n
        "},{"location":"ai-testbed/graphcore/virtual-environments/#installing-packages","title":"Installing Packages","text":"

        Install packages in the normal manner such as:

        python3 -m pip install \"some_package\"\n

        For more details see Use pip for installing.

        To install a different version of a package that is already installed in one's environment, one can use:

        pip install --ignore-installed  ... # or -I\n

        Note: Conda is not supported on the Graphcore system.

        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/","title":"Scaling ResNet50","text":"

        Follow all the instructions in Getting Started to log into a Graphcore node.

        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#examples-repo","title":"Examples Repo","text":"

        Graphcore provides examples of some well-known AI applications in their repository at https://github.com/graphcore/examples.git.

        Clone the examples repository to your personal directory structure:

        mkdir ~/graphcore\ncd ~/graphcore\ngit clone https://github.com/graphcore/examples.git\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#environment-setup","title":"Environment Setup","text":"

        Establish a virtual environment.

        mkdir -p ~/venvs/graphcore\nrm -rf ~/venvs/graphcore/poptorch31_rn50_env\nvirtualenv ~/venvs/graphcore/poptorch31_rn50_env\nsource ~/venvs/graphcore/poptorch31_rn50_env/bin/activate\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#install-poptorch","title":"Install PopTorch","text":"

        Install PopTorch.

        POPLAR_SDK_ROOT=/software/graphcore/poplar_sdk/3.1.0\nexport POPLAR_SDK_ROOT=$POPLAR_SDK_ROOT\npip install $POPLAR_SDK_ROOT/poptorch-3.1.0+98660_0a383de63f_ubuntu_20_04-cp38-cp38-linux_x86_64.whl\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#environment-variables","title":"Environment Variables","text":"

        Establish the following environment variables.

        mkdir ${HOME}/tmp\nexport TF_POPLAR_FLAGS=--executable_cache_path=${HOME}/tmp\nexport POPTORCH_CACHE_DIR=${HOME}/tmp\nexport POPART_LOG_LEVEL=WARN\nexport POPLAR_LOG_LEVEL=WARN\nexport POPLIBS_LOG_LEVEL=WARN\nexport PYTHONPATH=/software/graphcore/poplar_sdk/3.1.0/poplar-ubuntu_20_04-3.1.0+6824-9c103dc348/python:$PYTHONPATH\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#install-requirements","title":"Install Requirements","text":"
        cd ${HOME}/graphcore/examples/vision/cnns/pytorch/\nmake install\nmake install-turbojpeg\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#one-time-per-user-ssh-key-set-up","title":"One-time per user ssh key set up","text":"

        Set up the ssh key on gc-poplar-01.

        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#gc-poplar-01","title":"Gc-poplar-01","text":"

        On gc-poplar-01:

        mkdir ~/.ssh\ncd ~/.ssh\nssh-keygen -t rsa -b 4096\n#Accecpt default filename of id_rsa\n#Enter passphrase (empty for no passphrase):\n#Enter same passphrase again:\ncat id_rsa.pub >> authorized_keys\n
        ssh-keyscan -H gc-poplar-01 >> ~/.ssh/known_hosts\n

        You should see:

        # gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-01:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n
        ssh-keyscan -H gc-poplar-02 >> ~/.ssh/known_hosts\n

        You should see:

        # gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-02:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n
        ssh-keyscan -H gc-poplar-03 >> ~/.ssh/known_hosts\n

        You should see:

        # gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-03:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n
        ssh-keyscan -H gc-poplar-04 >> ~/.ssh/known_hosts\n

        You should see:

        # gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n# gc-poplar-04:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#benchmarksyml","title":"benchmarks.yml","text":"

        Update ${HOME}/graphcore/examples/vision/cnns/pytorch/train/benchmarks.yml with your favorite editor to match benchmarks.yml.

        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#configsyml","title":"configs.yml","text":"

        Update ${HOME}/graphcore/examples/vision/cnns/pytorch/train/configs.yml with your favorite editor. At about line 30, change use_bbox_info: true to use_bbox_info: false.

        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#scale-resnet50","title":"Scale ResNet50","text":"

        Scale and benchmark ResNet50.

        Note: The number at the end of each line indicates the number of IPUs.

        Note: Use screen because every run is long.

        \"PopRun exposes this control with the --process-placement flag and provides multiple pre-defined strategies. By default (and with --process-placement spreadnuma), PopRun is designed to be NUMA-aware. On each host, all the available NUMA nodes are divided among the instances. This means that each instance is bound to execute on and allocate memory from its assigned NUMA nodes, ensuring memory access locality. This strategy maximises memory bandwidth and is likely to yield optimal performance for most of the data loading workloads in machine learning.\" [Multi-Instance Multi-Host(https://docs.graphcore.ai/projects/poprun-user-guide/en/latest/launching.html#multi-instance-multi-host)

        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#setup","title":"Setup","text":"

        Move to the correct directory and establish the datasets directory.

        cd ${HOME}/graphcore/examples/vision/cnns/pytorch/train\nexport DATASETS_DIR=/mnt/localdata/datasets/\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#scaling-to-16-ipus","title":"Scaling to 16 IPUs","text":"

        One may use any of the following commands to run ResNet50 on one to sixteen IPUs.

        python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_1\npython3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_2\npython3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_4\npython3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_8\npython3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod16\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#scaling-to-64-ipus","title":"Scaling to 64 IPUs","text":"

        Note: One must complete the instructions on Multi-node Setup before running this example.

        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#establish-environment-variables","title":"Establish Environment Variables","text":"
        HOST1=`ifconfig eno1 | grep \"inet \" | grep -o '[0-9]\\{1,3\\}\\.[0-9]\\{1,3\\}\\.[0-9]\\{1,3\\}\\.[0-9]\\{1,3\\}' | head -1`\nOCT123=`echo \"$HOST1\" | cut -d \".\" -f 1,2,3`\nOCT4=`echo \"$HOST1\" | cut -d \".\" -f 4`\nHOST2=$OCT123.`expr $OCT4 + 1`\nHOST3=$OCT123.`expr $OCT4 + 2`\nHOST4=$OCT123.`expr $OCT4 + 3`\nexport HOSTS=$HOST1,$HOST2,$HOST3,$HOST4\nexport CLUSTER=c16\nexport IPUOF_VIPU_API_PARTITION_ID=p64\nexport TCP_IF_INCLUDE=$OCT123.0/8\nexport IPUOF_VIPU_API_HOST=$HOST1\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#64-ipu-run","title":"64 IPU Run","text":"

        This runs to convergence. It uses all 64 IPUs for more than 12 hours.

        Note: This should only be used if absolutely required.

        Execute:

        python3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod64\npython3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod64_conv\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#benchmark-results","title":"Benchmark Results","text":""},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#one-ipu","title":"One IPU","text":"
        [INFO] 2022-12-16 17:07:32: Total runtime: 3956.836479 seconds\n[INFO] 2022-12-16 17:07:32:    throughput = '7527.626315789474'\n[INFO] 2022-12-16 17:07:32:    accuracy = '57.41'\n[INFO] 2022-12-16 17:07:32:    loss = '2.8153'\n[INFO] 2022-12-16 17:07:33:    Total compile time: 429.59 seconds\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#two-ipus","title":"Two IPUs","text":"
        [INFO] 2022-12-16 15:56:23: Total runtime: 5866.494071 seconds\n[INFO] 2022-12-16 15:56:23:    throughput = '4798.778947368421'\n[INFO] 2022-12-16 15:56:23:    accuracy = '68.23'\n[INFO] 2022-12-16 15:56:23:    loss = '2.3148'\n[INFO] 2022-12-16 15:56:24:    Total compile time: 418.75 seconds\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#four-ipus","title":"Four IPUs","text":"
        [INFO] 2022-12-16 04:05:28: Total runtime: 3070.994553 seconds\n[INFO] 2022-12-16 04:05:28:    throughput = '9959.821052631578'\n[INFO] 2022-12-16 04:05:28:    accuracy = '67.76'\n[INFO] 2022-12-16 04:05:28:    loss = '2.338'\n[INFO] 2022-12-16 04:05:29:    Total compile time: 377.4 seconds\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#eight-ipus","title":"Eight IPUs","text":"
        [INFO] 2022-12-16 02:46:45: Total runtime: 1831.437598 seconds\n[INFO] 2022-12-16 02:46:45:    throughput = '19865.263157894733'\n[INFO] 2022-12-16 02:46:45:    accuracy = '64.94'\n[INFO] 2022-12-16 02:46:45:    loss = '2.4649'\n[INFO] 2022-12-16 02:46:46:    Total compile time: 386.27 seconds\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#sixteen-ipus","title":"Sixteen IPUs","text":"

        Epochs: 20

        [INFO] 2022-12-15 22:01:14: Total runtime: 1297.274336 seconds\n[INFO] 2022-62:01:14:    throughput = '39057.447368421046'\n[INFO] 2022-12-15 22:01:14:    accuracy = '57.43'\n[INFO] 2022-12-15 22:01:14:    loss = '2.8162'\n[INFO] 2022-12-15 22:01:16:    Total compile time: 397.08 seconds\n
        "},{"location":"ai-testbed/graphcore/unused/Scaling-ResNet50/#sixty-four-ipus","title":"Sixty-Four IPUs","text":"
        [1,0]<stdout>:[INFO] loss: 4.8367,\n[1,0]<stdout>:[INFO] accuracy: 18.83 %\n[1,0]<stdout>:[INFO] throughput: 51368.5 samples/sec\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/","title":"CosmicTagger Conversion","text":"

        The intent of this page is to show conceptually how to convert a model to run on the Graphcore system. It is not necessary to convert CosmicTagger because it has already been converted and is located at CosmicTagger on the Graphcore branch. The original is located at CosmicTagger.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#run-model-on-cpu","title":"Run Model on CPU","text":"

        The first step to converting a model is to verify that it runs on the CPU. This step has been verified for CosmicTagger.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#configpy","title":"Config.py","text":"

        CosmicTagger can run on multiple machines. As such, it is necessary to specify the architecture that one is using. For example, CPU or GPU. The architecture is stored in the ComputeMode class.

        Edit src/config/config.py. Add IPU to the ComputeMode class.

        class ComputeMode(Enum):\n    CPU   = 0\n    #...\n    IPU   = 5\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#trainerpy","title":"Trainer.py","text":"

        Edit src/utils/torch/trainer.py.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#import-poptorch","title":"Import PopTorch","text":"

        PopTorch is Graphcore's extension of PyTorch.

        Import poptorch at the top of the file.

        import poptorch\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#wrap-model","title":"Wrap Model","text":"

        Wrap the model using poptorch.trainingModel() so that it may be ran on IPUs for training.

        Wrap the model using poptorch.inferenceModel() when not training.

        Find the following code around line 90 in the init_network method.

                # Foregoing any fusions as to not disturb the existing ingestion pipeline\n        if self.is_training() and self.args.mode.quantization_aware:\n            self._raw_net.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')\n            self._net = torch.quantization.prepare_qat(self._raw_net)\n        else:\n            self._net = self._raw_net\n

        After the above code, add:

                if self.args.run.compute_mode == ComputeMode.IPU:\n            if self.is_training():\n                opts = poptorch.Options()\n                self._net = poptorch.trainingModel(self._net, opts, optimizer=torch.optim.SGD(self._net.parameters(), lr=1e-3))\n            else:\n                self._net = poptorch.inferenceModel(self._net)\n

        See poptorch.trainingModel() and poptorch.inferenceModel() for more information.

        There is also a Build the Model tutorial.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#update-optimizer","title":"Update Optimizer","text":"

        Update init_optimizer() to use the poptorch class instead of the torch class as needed.

        Change:

                if self.args.mode.optimizer.name == OptimizerKind.rmsprop:\n            self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)\n        else:\n            self._opt = torch.optim.Adam(self._net.parameters(), 1.0)\n

        to:

                if self.args.mode.optimizer.name == OptimizerKind.rmsprop:\n            if self.args.run.compute_mode == ComputeMode.IPU:\n                self._opt = poptorch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)\n            else:\n                self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)\n        else:\n            if self.args.run.compute_mode == ComputeMode.IPU:\n                self._opt = poptorch.optim.Adam(self._net.parameters(), 1.0)\n            else:\n                self._opt = torch.optim.Adam(self._net.parameters(), 1.0)\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#update-the-forward-pass","title":"Update the Forward Pass","text":"

        Putting the loss calculation in forward_pass() allows the loss computation to be performed on the IPUs. This will be faster because the data will not need to be transfered round-trip to the CPU.

        Change forward_pass():

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#original","title":"Original","text":"
                    if net is None:\n                logits_image = self._net(minibatch_data['image'])\n            else:\n                logits_image = net(minibatch_data['image'])\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#updated","title":"Updated","text":"

        The following code changes are to account for the loss function, i.e., self.loss_calculator, and the image labels, i.e., labels_image, to be passed to the model's forward_pass method. Additionally, the calculated loss is returned from the forward_pass method.

                    if net is None:\n                if self.args.run.compute_mode == ComputeMode.IPU:\n                    logits_image, labels_image, loss = self._net(minibatch_data['image'], self.loss_calculator, labels_image)\n                    return logits_image, labels_image, loss\n                else:\n                    logits_image = self._net(minibatch_data['image'])\n            else:\n                if self.args.run.compute_mode == ComputeMode.IPU and self.args.mode.name != ModeKind.inference:\n                    logits_image, labels_image, loss = net(minibatch_data['image'], self.loss_calculator, labels_image)\n                    return logits_image, labels_image, loss\n                else:\n                    logits_image = net(minibatch_data['image'])\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#update-the-training-step","title":"Update the Training Step","text":"

        Receive the extra loss variable from the forward_pass method.

        Update the train_step method.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#original-training-step","title":"Original Training Step","text":"
                            with self.timing_context(\"forward\"):\n                        if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:\n                            with torch.cuda.amp.autocast():\n                                logits_image, labels_image = self.forward_pass(minibatch_data)\n                        else:\n                            logits_image, labels_image = self.forward_pass(minibatch_data)\n\n                    verbose = False\n\n                    # Compute the loss based on the logits\n                    with self.timing_context(\"loss\"):\n                        loss = self.loss_calculator(labels_image, logits_image)\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#updated-training-step","title":"Updated Training Step","text":"

        The forward_pass() method was changed to return the extra variable loss in the previous section. It is now received conditionally when using an IPU(s).

        In the with self.timing_context(\"loss\"): section, only calculate loss if not using an IPU(s).

                            with self.timing_context(\"forward\"):\n                        if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:\n                            with torch.cuda.amp.autocast():\n                                logits_image, labels_image = self.forward_pass(minibatch_data)\n                        else:\n                            if self.args.run.compute_mode == ComputeMode.IPU:\n                                logits_image, labels_image, loss = self.forward_pass(minibatch_data)\n                            else:\n                                logits_image, labels_image = self.forward_pass(minibatch_data)\n\n                    verbose = False\n\n\n                    # Compute the loss based on the logits\n                    with self.timing_context(\"loss\"):\n                        if self.args.run.compute_mode == ComputeMode.IPU:\n                            loss = loss\n                        else:\n                            loss = self.loss_calculator(labels_image, logits_image)\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#update-validation-step","title":"Update Validation Step","text":"

        Update the val_step method.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#original-validation-step-code","title":"Original Validation Step Code","text":"

        Find this code.

                    if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:\n                with torch.cuda.amp.autocast():\n                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)\n            else:\n                logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)\n\n            # Compute the loss based on the logits\n            loss = self.loss_calculator(labels_image, logits_image)\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#updated-validation-step-code","title":"Updated Validation Step Code","text":"

        Change the code to the following.

                    if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:\n                with torch.cuda.amp.autocast():\n                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)\n\n                    # Compute the loss based on the logits\n                    loss = self.loss_calculator(labels_image, logits_image)\n            else:\n                if self.args.run.compute_mode == ComputeMode.IPU:\n                    logits_image, labels_image, loss = self.forward_pass(minibatch_data, net=val_net)\n                else:\n                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)\n\n                    # Compute the loss based on the logits\n                    loss = self.loss_calculator(labels_image, logits_image)\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#uresnet2d-model","title":"UResNet2D Model","text":""},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#update-model","title":"Update Model","text":"

        The Graphcore system is more computationally efficient if the loss function is on the IPU. This is accomplished by using the loss function within the model's forward method.

        Edit src/networks/torch/uresnet2D.py.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#update-the-forward-declaration","title":"Update the Forward Declaration","text":"

        Find the forward method.

        def forward(self, input_tensor):\n

        Update the argument list to include the loss function, i.e., loss_calculator and the image labels, i.e., labels_image.

        def forward(self, input_tensor, loss_calculator=None, labels_image=None):\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#add-loss-calculation","title":"Add Loss Calculation","text":"

        Add the loss calculation just before the forward method returns.

                if loss_calculator is not None:\n\n            labels_image = labels_image.long()\n            labels_image = torch.chunk(labels_image, chunks=3, dim=1)\n            shape =  labels_image[0].shape\n            labels_image = [ _label.view([shape[0], shape[-2], shape[-1]]) for _label in labels_image ]\n\n            loss = loss_calculator(labels_image, x)\n            import poptorch\n            loss = poptorch.identity_loss(loss , reduction=\"mean\")\n            return x, labels_image, loss\n\n        # This return already exists.\n        return x\n

        The poptorch.identity_loss method takes a single PyTorch tensor and will backpropagate a gradient of ones through it. You may find an example at here

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-conversion/#binexecpy","title":"bin/exec.py","text":"

        The following is included for completeness. One will not likely find this in other code.

        Open bin/exec.py in your favorite editor. Change:

        @hydra.main(version_base=None, config_path=\"../src/config\", config_name=\"config\")\n

        to

        @hydra.main(config_path=\"../src/config\", config_name=\"config\")\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/","title":"CosmicTagger Conversion","text":"

        The intent of this page is to show conceptually how to convert a Graphcore model to run on Distributed Data Parallel using PopDist. It is not necessary to convert CosmicTagger because it has already been converted and is located at CosmicTagger on the GraphcoreDDP branch. The original is located at CosmicTagger.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/#run-model-on-cpu","title":"Run Model on CPU","text":"

        The first step to converting a model is to verify that it runs on the CPU. This step has been verified for CosmicTagger.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/#starter-code","title":"Starter Code","text":"

        You may use the code at CosmicTagger on the Graphcore branch.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/#trainerpy","title":"Trainer.py","text":"

        Edit src/utils/torch/trainer.py.

        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/#import-poplar-packages","title":"Import Poplar Packages","text":"

        PopTorch is Graphcore's extension of PyTorch.

        PopDist is Graphcore's distributed processing package.

        Import poptorch and popdist at the top of the file.

        try:\n    import poptorch\n    import popdist\n    import popdist.poptorch\nexcept:\n    pass\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/#initialization","title":"Initialization","text":"

        Initialize popdist for distributed computing.

        Establish a class variable name instance. This is used to differentiate between different model instances that will be saved.

        Add the following line at the bottom of init().

                if self.args.run.compute_mode == ComputeMode.IPU and popdist.isPopdistEnvSet():\n            popdist.init()\n            self._instance = popdist.getInstanceIndex()\n        else:\n            self._instance = 0\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/#use-instance-variable","title":"Use Instance Variable","text":"

        Use the instance variable for the model file name.

        Find def get_model_filepath.

        Change:

                name = file_path + 'model-{}.ckpt'.format(self._global_step)\n

        To:

                name = file_path + f'model-{self._global_step}-{self._instance}.ckpt'\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/#establish-logging-method","title":"Establish Logging Method","text":"

        Add a helper function to log data at the bottom of the file.

            def log_in_single_instance(self, string):\n        if self.args.run.compute_mode == ComputeMode.IPU:\n            if not popdist.isPopdistEnvSet() or popdist.getInstanceIndex() == 0:\n                logging.info(string)\n        else:\n            logging.info(string)\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/#update-init_network","title":"Update Init_network()","text":"

        PopTorch has an Option() method which returns values that get passed to poptorch.trainingModel. The returned values are stored in opts in this example.

        Find:

                if self.args.run.compute_mode == ComputeMode.IPU:\n            if self.is_training():\n                opts = poptorch.Options()\n                self._net = poptorch.trainingModel(self._net, opts, optimizer=torch.optim.SGD(self._net.parameters(), lr=1e-3))\n            else:\n                self._net = poptorch.inferenceModel(self._net)\n

        Replace it with:

                if self.args.run.compute_mode == ComputeMode.IPU:\n            if popdist.isPopdistEnvSet():\n                opts = popdist.poptorch.Options()\n                # When using the dataloader with 'auto_distributed_partitioning=True'\n                # and 'shuffle=True' we must set the random seed to ensure that tensors\n                # are in the same order in all processes.\n                opts.randomSeed(42)\n                # Replication factor is already set via PopRun so\n                # we ignore 'args.num_replicas'.\n                logging.info(f\"Num of local replicas: {popdist.getNumLocalReplicas()}\")\n            else:\n                opts = poptorch.Options()\n                opts.replicationFactor(self.args.num_replicas)\n\n            if self.is_training():\n                self._net = poptorch.trainingModel(self._net, opts, optimizer=torch.optim.SGD(self._net.parameters(), lr=1e-3))\n            else:\n                self._net = poptorch.inferenceModel(self._net)\n
        "},{"location":"ai-testbed/graphcore/unused/cosmictagger-ddp/#run-the-code","title":"Run The Code","text":"

        See instructions in README_GRAPHCORE.md.

        "},{"location":"ai-testbed/graphcore/unused/multi-node-setup/","title":"Multi-node Setup","text":"

        These steps only need to be executed once per user.

        Running on multiple nodes is a three step process.

        1. Create a Key

          cd ~/.ssh\nssh-keygen -t rsa -b 4096\n
        2. Put Key into Authorized_keys File

          cat id_rsa.pub >> authorized_keys\n
        3. Add Node IP Addresses to Known_hosts File

          ssh-keyscan -H 10.1.3.101 >> ~/.ssh/known_hosts\nssh-keyscan -H 10.1.3.102 >> ~/.ssh/known_hosts\nssh-keyscan -H 10.1.3.103 >> ~/.ssh/known_hosts\nssh-keyscan -H 10.1.3.104 >> ~/.ssh/known_hosts\n
        "},{"location":"ai-testbed/graphcore/unused/profiling-mnist/","title":"Profiling MNIST","text":"

        Follow all the instructions in Getting Started to log into a Graphcore node.

        Follow the instructions in Virtual Environments up to and including PopART Environment Setup.

        Following the instructions in Example Programs up to and including MNIST, Install Requirements.

        "},{"location":"ai-testbed/graphcore/unused/profiling-mnist/#change-directory","title":"Change Directory","text":"
        cd ~/graphcore/tutorials/simple_applications/pytorch/mnist\n
        "},{"location":"ai-testbed/graphcore/unused/profiling-mnist/#set-poplar-options","title":"Set Poplar Options","text":"

        Set the option to generate all reports, i.e., \"autoReport.all\":\"true\".

        Set the reports directory, i.e., \"autoReport.directory\":\"./reports\".

        Do so by running the following commands:

        export POPLAR_ENGINE_OPTIONS='{\"autoReport.all\":\"true\", \"autoReport.directory\":\"./reports\"}'\n
        "},{"location":"ai-testbed/graphcore/unused/profiling-mnist/#run-mnist","title":"Run MNIST","text":"

        Do so by running the following command:

        python mnist_poptorch.py\n

        When MNIST has finished running, see Profiling to use Graph Analyser.

        "},{"location":"ai-testbed/graphcore/unused/profiling-resnet50/","title":"Profiling ResNet50","text":"

        Follow all the instructions in Getting Started to log into a Graphcore node.

        Follow the instructions in Virtual Environments up to and including PopART Environment Setup.

        "},{"location":"ai-testbed/graphcore/unused/profiling-resnet50/#examples-repo","title":"Examples Repo","text":"

        Graphcore provides examples of some well-known AI applications in their repository at https://github.com/graphcore/examples.git.

        Clone the examples repository to your personal directory structure:

        mkdir ~/graphcore\ncd ~/graphcore\ngit clone https://github.com/graphcore/examples.git\n
        "},{"location":"ai-testbed/graphcore/unused/profiling-resnet50/#install-requirements","title":"Install Requirements","text":"

        Change directory

        cd ~/graphcore/examples/vision/cnns/pytorch\npython -m pip install -r requirements.txt\n
        "},{"location":"ai-testbed/graphcore/unused/profiling-resnet50/#export-variables","title":"Export Variables","text":"

        Export the datasets directory.

        export POPLAR_ENGINE_OPTIONS='{\"autoReport.all\":\"true\", \"autoReport.directory\":\"./reports\"}'\nexport DATASETS_DIR=/software/datasets\nHOST1=`ifconfig eno1 | grep \"inet \" | grep -o '[0-9]\\{1,3\\}\\.[0-9]\\{1,3\\}\\.[0-9]\\{1,3\\}\\.[0-9]\\{1,3\\}' | head -1`\nOCT123=`echo \"$HOST1\" | cut -d \".\" -f 1,2,3`\nOCT4=`echo \"$HOST1\" | cut -d \".\" -f 4`\nHOST2=$OCT123.`expr $OCT4 + 1`\nHOST3=$OCT123.`expr $OCT4 + 2`\nHOST4=$OCT123.`expr $OCT4 + 3`\nexport HOSTS=$HOST1,$HOST2,$HOST3,$HOST4\nexport CLUSTER=c16\nVIPU_SERVER=${VIPU_SERVER:=$HOST1}\nFIRST_PARTITION=`vipu-admin list partitions --api-host $VIPU_SERVER| grep ACTIVE | cut -d '|' -f 3 | cut -d ' ' -f 2 | head -1`\nPARTITON=${PARTITION:=$FIRST_PARTITION}\n
        "},{"location":"ai-testbed/graphcore/unused/profiling-resnet50/#profile-resnet50","title":"Profile ResNet50","text":"

        Profile ResNet50.

        Note: Use screen because every run is long.

        cd train\npython3 -m examples_utils benchmark --spec benchmarks.yml --benchmark pytorch_resnet50_train_real_pod16\n
        "},{"location":"ai-testbed/graphcore/unused/profiling-resnet50/#profile-results","title":"Profile Results","text":"

        When ResNet50 has finished running, see Profiling to use Graph Analyser.

        "},{"location":"ai-testbed/graphcore/unused/profiling/","title":"Profiling","text":"

        This is an adaptation of Capturing IPU Reports.

        "},{"location":"ai-testbed/graphcore/unused/profiling/#reports","title":"Reports","text":""},{"location":"ai-testbed/graphcore/unused/profiling/#capturing-ipu-reports","title":"Capturing IPU Reports","text":"

        See Capturing IPU Reports for more information.

        This section describes how to generate the files that the Graph Analyser can analyze. The Graph Analyser uses report files generated during compilation and execution by the Poplar SDK.

        "},{"location":"ai-testbed/graphcore/unused/profiling/#ipu-memory-overhead","title":"IPU Memory Overhead","text":"

        Because of all these extra memory requirements, a model with high memory consumption may go out of memory when profiling is enabled. Depending on the model, you can adjust its parameters to leave space for the instrumentation. For example, you can try decreasing the batch size. In TensorFlow BERT you can adjust the micro batch-size.

        "},{"location":"ai-testbed/graphcore/unused/profiling/#host-computing-overhead","title":"Host Computing Overhead","text":"

        It is essential that you also try to reduce the iterations on each run. For instance, by reducing the number of steps or the number of batches per step you can get a lighter execution profile. This will not only reduce the host computation overhead but will also speed up visualization in the Graph Analyser.

        "},{"location":"ai-testbed/graphcore/unused/profiling/#download-popvision","title":"Download PopVision","text":"
        1. Download PopVision Tools.

        2. Click Download Now button.

        3. In the Graph Analyser section, select you operating system.

        4. Install per selected operating system.

        "},{"location":"ai-testbed/graphcore/unused/profiling/#create-ssh-session","title":"Create SSH Session","text":"

        Use ssh from your development system.

        The ssh command will use a jumphost and port forwarding. The format is as follows:

        ssh -J ALCFUserID@gc-login-dd.ai.alcf.anl.gov ALCFUserID@gc-poplar-DD -L 8090:127.0.0.1:22\nssh -J wilsonb@gc-login-01.ai.alcf.anl.gov wilsonb@gc-poplar-02.ai.alcf.anl.gov -L 8090:127.0.0.1:22\n

        Where:

        Argument Help ALCFUserID Is your ALCF user identification. dd Is the Graphcore login node to use, i.e., 01 or 02 DD Is the Graphcore node to use, i.e., 01, 02, 03, or 04. 8090 Is the port on your local machine. 127.0.0.1:22 Is the local IP address and port on the remote machine.

        You will receive a prompt.

        "},{"location":"ai-testbed/graphcore/unused/profiling/#launch-graph-analyser","title":"Launch Graph Analyser","text":"

        Continue on your development machine.

        "},{"location":"ai-testbed/graphcore/unused/profiling/#operating-system","title":"Operating System","text":""},{"location":"ai-testbed/graphcore/unused/profiling/#ubuntu","title":"Ubuntu","text":"
        cd /path/to/graph/analyser/directory\n./popvision-graph-analyser-3.11.6.AppImage\n
        "},{"location":"ai-testbed/graphcore/unused/profiling/#user-interface","title":"User Interface","text":"
        1. Click Open a report...;
        2. Click the remote tab;
        3. Enter your ALCFUserID for remote machine;
        4. Enter the Hostname of your local machine, i.e., 127.0.0.1;
        5. Enter your Port address used in the ssh command, e.g., 8090;
        6. Click Connect;
        7. Navigate to your reports directory;
        8. Select the training directory;
        9. Select archive.a file; and
        10. Click Open button.

        The Summary Report will be displayed.

        "},{"location":"ai-testbed/groq/getting-started/","title":"Getting Started","text":""},{"location":"ai-testbed/groq/getting-started/#allocations","title":"Allocations","text":"

        If you do not already have an allocation, you will need to request one here: Discretionary Allocation Request (New & Renewal)

        "},{"location":"ai-testbed/groq/getting-started/#accounts","title":"Accounts","text":"

        If you do not have an ALCF account (but have an allocation), request one here: ALCF Account and Project Management

        "},{"location":"ai-testbed/groq/getting-started/#setup","title":"Setup","text":"

        Connection to a GroqRack node is a two-step process.

        The first step is to ssh from a local machine to a login node. The second, optional step is to ssh from a login node to a GroqRack node. Jobs may also be started and tracked from login nodes.

        "},{"location":"ai-testbed/groq/getting-started/#log-in-to-a-login-node","title":"Log in to a login node","text":"

        Connect to a groq login node, editing this command line to use your ALCF user id. You will be prompted for a password; use the 8-digit code provided by MobilePASS+.

        ssh ALCFUserID@groq.ai.alcf.anl.gov\n
        This randomly selects one of the login nodes, namely groq-login-01.ai.alcf.anl.gov or groq-login-02.ai.alcf.anl.gov. You can alternatively ssh to the specific login nodes directly.

        "},{"location":"ai-testbed/groq/getting-started/#log-in-to-a-groqrack-node","title":"Log in to a GroqRack node","text":"

        Once you are on a login node, optionally ssh to one of the GroqRack nodes, which are numbered 1-9.

        ssh groq-r01-gn-01.ai.alcf.anl.gov\n# or\nssh groq-r01-gn-09.ai.alcf.anl.gov\n# or any node with hostname of form groq-r01-gn-0[1-9].ai.alcf.anl.gov\n
        "},{"location":"ai-testbed/groq/job-queuing-and-submission/","title":"Job Queueing and Submission","text":"

        Groq jobs in the AI Testbed's groqrack are managed by the PBS job scheduler. Overview: PBS For additional information, see https://docs.alcf.anl.gov/running-jobs/job-and-queue-scheduling/ Man pages are available. These are the key commands:

        # qsub - to submit a batch job using a script\nman qsub\n# qstat - to display queue information\nman qstat\n# qdel - to delete (cancel) a job:\nman qdel\n# qhold - to hold a job\nman qhold\n

        "},{"location":"ai-testbed/groq/running-a-model-or-program/","title":"Running a Model/Program","text":"

        Jobs are launched from any GroqRack node, or from login nodes. If you expect a loss of an internet connection for any reason, for long-running jobs we suggest logging into a specific node and using either screen or tmux to create persistent command line sessions. For details use:

        man screen\n# or\nman tmux\n
        or online man pages: screen, tmux

        "},{"location":"ai-testbed/groq/running-a-model-or-program/#running-jobs-on-groq-nodes","title":"Running jobs on Groq nodes","text":""},{"location":"ai-testbed/groq/running-a-model-or-program/#groqflow","title":"GroqFlow","text":"

        GroqFlow is the simplest way to port applications running inference to groq. The groqflow github repo includes many sample applications. See GroqFlow.

        "},{"location":"ai-testbed/groq/running-a-model-or-program/#clone-the-groqflow-github-repo","title":"Clone the GroqFlow github repo","text":"

        Clone the groqflow github repo and change current directory to the clone:

        cd ~/\ngit clone https://github.com/groq/groqflow.git\ncd groqflow\n

        "},{"location":"ai-testbed/groq/running-a-model-or-program/#groqflow-conda-environments","title":"GroqFlow conda environments","text":"

        Create a groqflow conda environment, and activate it. Follow the instructions in the Virtual Environments section. Note: Similar install instructions are in ~/groqflow/docs/install.md or GroqFlow\u2122 Installation Guide The conda enviroment should be reinstalled whenever new groqflow code is pulled from the groqflow github; with a groqflow conda environment activated, redo just the pip install steps.

        "},{"location":"ai-testbed/groq/running-a-model-or-program/#running-a-groqflow-sample","title":"Running a groqflow sample","text":"

        Each groqflow sample directory in the ~/groqflow/proof_points tree has a README.md describing the sample and how to run it.

        "},{"location":"ai-testbed/groq/running-a-model-or-program/#optionally-activate-your-groqflow-conda-environment","title":"Optionally activate your GroqFlow conda environment","text":"
        conda activate groqflow\n
        "},{"location":"ai-testbed/groq/running-a-model-or-program/#run-a-sample-using-pbs-in-batch-mode","title":"Run a sample using PBS in batch mode","text":"

        See Job Queueing and Submission for more information about the PBS job scheduler.

        Create a script run_minilmv2.sh with the following contents. It assumes that conda was installed in the default location. The conda initialize section can also be copied from your .bashrc if the conda installer was allowed to add it.

        #!/bin/bash\n# >>> conda initialize >>>\n# !! Contents within this block are managed by 'conda init' !!\n__conda_setup=\"$(${HOME}'/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)\"\nif [ $? -eq 0 ]; then\n    eval \"$__conda_setup\"\nelse\n    if [ -f \"${HOME}/miniconda3/etc/profile.d/conda.sh\" ]; then\n        . \"${HOME}/miniconda3/etc/profile.d/conda.sh\"\n    else\n        export PATH=\"${HOME}/miniconda3/bin:$PATH\"\n    fi\nfi\nunset __conda_setup\n# <<< conda initialize <<<\nconda activate groqflow\ncd ~/groqflow/proof_points/natural_language_processing/minilm\npip install -r requirements.txt\npython minilmv2.py\n

        Then run the script as a batch job with PBS:

        qsub -l groq_accelerator=1 run_minilmv2.sh\n

        Note: the number of chips used by a model can be found in the compile cache dir for the model after it is compiled. E.g.

        $ grep num_chips_used ~/.cache/groqflow/minilmv2/minilmv2_state.yaml\nnum_chips_used: 1\n
        The groqflow proofpoints models use 1, 2 or 4 chips.

        If your ~/.bashrc initializes conda, an alternative to copying the conda initilization script into your execution scripts is to comment out this section in your \"~/.bashrc\":

        # If not running interactively, don't do anything\ncase $- in\n    *i*) ;;\n      *) return;;\nesac\n
        to
        ## If not running interactively, don't do anything\n#case $- in\n#    *i*) ;;\n#      *) return;;\n#esac\n
        Then the execution script becomes:
        #!/bin/bash\nconda activate groqflow\ncd ~/groqflow/proof_points/natural_language_processing/minilm\npip install -r requirements.txt\npython minilmv2.py\n
        Job status can be tracked with qstat:
        $ qstat\nJob id            Name             User              Time Use S Queue\n----------------  ---------------- ----------------  -------- - -----\n3084.groq-r01-co* run_minilmv2     user              0 R workq           \n$ \n

        Output will by default go to two files with names like the following, where the suffix is the job id. One standard output for the job. The other is the standard error for the job.

        $ ls -la run_minilmv2.sh.*\n-rw------- 1 user users   448 Oct 16 18:40 run_minilmv2.sh.e3082\n-rw------- 1 user users 50473 Oct 16 18:42 run_minilmv2.sh.o3082\n

        "},{"location":"ai-testbed/groq/running-a-model-or-program/#run-a-sample-using-pbs-in-interactive-mode","title":"Run a sample using PBS in interactive mode","text":"

        An alternative is to use an interactive PBS job. This may be useful when debugging new or changed code. Here is an example that starts a 24 hour interactive job.

        qsub -IV -l walltime=24:00:00 -l groq_accelerator=2\n
        Then activate your groqflow environment, and run python scripts with
        conda activate groqflow\npython scriptname.py\n

        "},{"location":"ai-testbed/groq/system-overview/","title":"System Overview","text":"

        ALCF's Groq system consists of a single GroqRackTM compute cluster that provides an extensible accelerator network consisting of 9 GroqNodeTM [ groq-r01-gn-01 through groq-r01-gn-09 ] nodes with a rotational multi-node network topology. Each of these GroqNodes consists of 8 GroqCardTM accelerators in them with integrated chip-to-chip connections with a dragonfly multi-chip topology.

        GroqCardTM accelerator is a dual-width, full-height, three-quarter length PCI-Express Gen4 x16 adapter that includes a single GroqChipTM processor with 230 MB of on-chip memory. Based on the proprietary Tensor Streaming Processor (TSP) architecture, the GroqChip processor is a low latency and high throughput single core SIMD compute engine capable of 750 TOPS (INT8) and 188 TFLOPS (FP16) @ 900 MHz that includes advanced vector and matrix mathematical acceleration units. The GroqChip processor is deterministic, providing predictable and repeatable performance.

        The GroqWare suite SDK uses a API based programming model and enables users to develop, compile, and run models on the GroqCard accelerator in a host server system. The SDK uses a ONNX/MLIR enabled DAG compiler and it consists of Groq Compiler, Groq API, and utility tools like GroqView\u2122 profiler and groq-runtime.

        \n

        For more information refer to the following links:

        GroqRack spec sheet GroqNode spec sheet GroqCard spec sheet GroqChip spec sheet (via)

        "},{"location":"ai-testbed/groq/virtual-environments/","title":"Virtual Environments","text":""},{"location":"ai-testbed/groq/virtual-environments/#install-conda","title":"Install conda","text":"

        If conda is not already installed:

        rm Miniconda3-latest-Linux-x86_64.sh*\nwget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\nbash Miniconda3-latest-Linux-x86_64.sh\n# answer y/yes to all prompts\n# exit ssh session, then start a new ssh session\nexit\n

        "},{"location":"ai-testbed/groq/virtual-environments/#groqflow-conda-environment-setup","title":"GroqFlow conda environment setup","text":""},{"location":"ai-testbed/groq/virtual-environments/#create-and-activate-a-groqflow-conda-environment","title":"Create and activate a groqflow conda environment","text":"

        Create a groqflow conda environment and activate it

        export PYTHON_VERSION=3.10.12\nconda create -n groqflow python=$PYTHON_VERSION\nconda activate groqflow\n

        "},{"location":"ai-testbed/groq/virtual-environments/#install-groqflow-into-the-groqflow-conda-environment","title":"Install groqflow into the groqflow conda environment","text":"

        Execute the following commands to install groqflow into the activated groqflow conda environment

        # Alter this if you have cloned groqflow to some other location.\ncd ~/groqflow\npip install --upgrade pip\npip install -e .\npushd . \ncd demo_helpers\npip install -e .\npopd\npip install soundfile\n

        To use groqfloq,

        conda activate groqflow\n
        Note: Always use a personal conda environment when installing packages on groq nodes; otherwise they can get installed into ~/.local and can cause problems when your shared home directory is used on other systems. If you encounter mysterious package dependency/version issues, check your ~/.local/lib and ~/.local/bin for mistakenly installed packages.

        Note: The conda enviroment should be reinstalled whenever new groqflow code is pulled from the groqflow github; with a groqflow conda environment activated, redo just the pip install steps.

        "},{"location":"ai-testbed/sambanova/TODO/","title":"TODO","text":"
        • docs/ai-testbed/sambanova_gen2/example-multi-node-programs.md
        • docs/ai-testbed/sambanova_gen2/ GPT2 example

        Using /data/ANL/results/sn30-r1-h1/wilsonb/032223.18/GPT1.5B.out for output Using /data/ANL/results/sn30-r2-h1/wilsonb/032223.19/GPT1.5B.out for output

        Using /data/ANL/results/sn30-r2-h1/wilsonb/032223.19/BertLarge.out for output

        "},{"location":"ai-testbed/sambanova/documentation/","title":"Documentation","text":"

        The SambaNova documentation is now available online SambaNova Documentation.

        The documentation for the SambaTune (a profiling and performance tuning tool for SambaNova systems) is now available at SambaTune Documentation.

        "},{"location":"ai-testbed/sambanova/example-multi-node-programs/","title":"Example Multi-Node Programs","text":"

        In this section we will learn how to extend the UNet2d and Gpt1.5B applications scripts that we introduced in the Example Programs to compile and run multiple instances of the model in a data parallel fashion across multiple tiles or across multiple nodes.

        "},{"location":"ai-testbed/sambanova/example-multi-node-programs/#unet2d","title":"UNet2d","text":""},{"location":"ai-testbed/sambanova/example-multi-node-programs/#set-up","title":"Set Up","text":"

        Create the following directory and change to it if you have not already done so.

        mkdir -p ~/apps/image/unet\ncd ~/apps/image/unet\n
        "},{"location":"ai-testbed/sambanova/example-multi-node-programs/#create-unet2dsh-and-unet_batchsh","title":"Create Unet2d.sh and unet_batch.sh","text":"

        Create the file Unet2d.sh and unet_batch.sh in the current directory using your favorite editor. Copy and paste the contents of Unet2d.sh and unet_batch.sh to files with the same name into the current directory using your favorite editor.

        chmod +x Unet2d.sh\nchmod +x unet_batch.sh\n
        "},{"location":"ai-testbed/sambanova/example-multi-node-programs/#compile-and-run","title":"Compile and run","text":"

        Run these commands for training (compile + train): The compile and run scripts have the following input arguments.

        1. image size: The images are square. Valid sizes include 256, 512, and 1024.

        2. Batch size: local batch size. The global batch size is local batch size * Num of instances.

        3. num of instances: Total number of instances of Unet2d run in data parallel framework.

        4. RunID: A unique Id for the compile or run process.

        The script uses the arguments pcompile and prun for the data parallel compile and run.

        ./Unet2d.sh pcompile <image size> <batch_size> <num of instances> <RunID>\n./Unet2d.sh prun <image size> <batch_size> <num of instances> <RunID>\n

        For a image size of 256x256 and local batch size of 256 when running 8 instance, the commands are provided as follows.

        ./Unet2d.sh pcompile 256 256 8 unet2d_8inst_pcompile\n./Unet2d.sh prun 256 256 8 unet2d_8inst_prun\n

        The above commands displays the file that contains the output for the execution of the above scripts, usually /data/ANL/results/<hostname>/<userId>/<RunID>/Unet2d.out

        You can inspect the compile command that contains --data-parallel -ws 2 arguments to ensure that the pef file is compatible for data parallel runs. The pef generated from the compilation process for the above compile command is placed under out/Unet2d/unet_train_256_256_NP_4 inside the current working directory.

        python /opt/sambaflow/apps/image/segmentation/compile.py compile --mac-v2 --in-channels=3 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --num-tiles=4 --pef-name=unet_train_${BS}_${2}_NP_${NUM_TILES}  --data-parallel -ws 2 --output-folder=${OUTDIR}\n

        Once the model is compiled, sbatch is used to launch the multiple instances. The below example shows that a total of 8 tasks or instances are launched over the host on which the script is launched.

        sbatch --gres=rdu:1 --tasks-per-node ${NP} --nodes 1 --nodelist $(hostname) --cpus-per-task=${cpus} $(pwd)/unet_batch.sh ${NP} ${NUM_WORKERS} ${BS} ${2} ${5}\n

        The run command has --data-parallel --reduce-on-rdu arguments that is compatible with data parallel run.

        srun --mpi=pmi2 python /opt/sambaflow/apps/image/segmentation//hook.py run --data-cache=${CACHE_DIR}  --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling  --min-throughput 395 --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --data-parallel --reduce-on-rdu --pef=${OUTDIR}/unet_train_${BS}_${IM}_NP_4/unet_train_${BS}_${IM}_NP_4.pef\n

        The throughput is calculated by averaging the e2e samples_per_sec over the different instances.

        inner train loop time : 36.314290046691895 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 563.9653143065\ninner train loop time : 33.36756229400635 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 613.7697389922524\ninner train loop time : 33.94625234603882 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 603.3066563941279\ninner train loop time : 32.309499979019165 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 633.8692958200872\ninner train loop time : 31.418426036834717 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 651.8467849404489\ninner train loop time : 28.164129495620728 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 727.1660927132315\ninner train loop time : 30.29698896408081 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 675.9747651583616\ninner train loop time : 25.332663536071777 for 10 epochs, number of global steps: 10, e2e samples_per_sec: 808.442427336472\n
        "},{"location":"ai-testbed/sambanova/example-multi-node-programs/#gpt-15b","title":"Gpt 1.5B","text":""},{"location":"ai-testbed/sambanova/example-multi-node-programs/#set-up_1","title":"Set up","text":"
        mkdir ~/nlp-multiNodetest\ncd ~/nlp-multiNodetest\n
        "},{"location":"ai-testbed/sambanova/example-multi-node-programs/#create-and-run-gpt15b_compilesh-and-gpt15b_runsh","title":"Create and run Gpt1.5B_compile.sh and Gpt1.5B_run.sh","text":"

        Create the files Gpt1.5B_compile.sh and Gpt1.5B_run.sh in the current directory. Copy the contents of Gpt1.5B_compile.sh and Gpt1.5B_run.sh. Alternatively, the files can be accessed at /data/ANL/scripts/Gpt1.5B_compile.sh and /data/ANL/scripts/Gpt1.5B_run.sh on any of the compute node and can be copied over to the working directory.

        "},{"location":"ai-testbed/sambanova/example-multi-node-programs/#compile-and-run_1","title":"Compile and Run","text":"

        This script consists of commands to compile and run multiple instances of Gpt1.5B model across multiple nodes. Run the Gpt1.5B_compile.sh to first compile and generate the pef file for the model and it in turn launches the Gpt1.5B_run.sh script to run multiple instances of the model over the different nodes.

        chmod +x Gpt1.5B_compile.sh\nchmod +x Gpt1.5B_run.sh\n./Gpt1.5B_compile.sh\n

        You can see the log file path displayed on the screen as seen in the example below. You can use the tail command to check the progress of the run.

        vsastry@sn30-r1-h1:~/nlp-multiNodetest$ ./Gpt1.5B_compile.sh\nUsing /data/ANL/results/sn30-r1-h1/vsastry/041823.19/GPT1.5B.out for output\n

        The artifacts of the compile process is produced in the path : /data/scratch/<userId>.

        Inspect the compile command in the script to see that it includes additional arguments --data-parallel and -ws 2 to generate a pef that is compatible for data parallel runs.

        python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 16 --output_dir=${OUTDIR}/hf_output --overwrite_output_dir --do_train  --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/ --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_nonpardp_lnsd.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt2_sc_recompute_spatialmapping_tiling16_clsmerge_withcls_nonpardp_norc_e2e.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --data-parallel -ws 2 --weight_decay 0.1  --max_grad_norm_clip 1.0 --num-tiles 4 --pef-name=gpt15 --output-folder=${OUTDIR}\n

        Once the model is compiled, sbatch is used to launch the multiple instances across the nodes. The below example shows that a total of 32 tasks or instances are launched over 2 nodes with each node having a maximum of 16 tasks. Slurm allocates any 2 of the available nodes in this example.

        /usr/local/bin/sbatch --output=${HOME}/slurm-%A.out --ntasks 32 --gres=rdu:1 --ntasks-per-node 16  --nodes 2 --cpus-per-task=8  Gpt1.5B_run.sh ${1} >> ${OUTPUT_PATH} 2>&1\n

        The run command for each of this instance is present in the Gpt1.5B_run.sh script. You can inspect the command in the script to see that --data-parallel --reduce-on-rdu arguments are present to ensure that the model is run in a data parallel fashion and that the gradient accumulation takes place on the RDU.

        /usr/local/bin/srun --mpi=pmi2 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run  -b 16  --module_name gpt2_pretrain --task_name clm --max_seq_length 1024  --overwrite_output_dir --do_train  --per_device_train_batch_size 16 --cache ${OUTDIR}/cache/  --tokenizer_name gpt2 --model_name gpt2 --non_split_head --skip_broadcast_patch --no_index_select_patch --output_dir=${OUTDIR}/hf_output --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --max_grad_norm_clip 1.0 --skip_checkpoint --data-parallel --reduce-on-rdu --data_dir /data/ANL/ss1024 --data_dir /data/ANL/ss1024  --logging_steps 1 --max_steps 900000 --learning_rate 0.00025 --steps_this_run 800 --min_throughput 299000 --max_throughput 600000 --pef=${OUTDIR}/gpt15/gpt15.pef >> ${OUTPUT_PATH} 2>&1\n

        squeue shows that the model is run on 2 nodes sn30-r1-h1 and sn30-r2-h2.

        JOBID PARTITION                      NAME     USER ST       TIME  NODES NODELIST(REASON)\n10191 sambanova            Gpt1.5B_run.sh  vsastry  R      23:18      2 sn30-r1-h1,sn30-r2-h2\n

        sntilestat can also be used to check the total numbers of tiles used for the runs.

        TILE                 %idle %exec %pload %aload %chkpt %quiesce    PID     USER COMMAND\n/XRDU_0/RDU_0/TILE_0   8.0  91.6    0.3    0.1    0.0      0.0 2750333  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_0/TILE_1   8.0  91.6    0.3    0.1    0.0      0.0 2750333  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_0/TILE_2   7.9  91.6    0.3    0.3    0.0      0.0 2750333  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_0/TILE_3   7.7  91.8    0.3    0.3    0.0      0.0 2750333  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_0/TILE_4   7.6  91.9    0.4    0.1    0.0      0.0 2750339  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_0/TILE_5   7.5  91.9    0.5    0.1    0.0      0.0 2750339  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_0/TILE_6   7.5  91.8    0.5    0.3    0.0      0.0 2750339  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_0/TILE_7   7.3  92.0    0.6    0.0    0.0      0.0 2750339  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_1/TILE_0   8.9  89.9    1.0    0.1    0.0      0.0 2750338  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_1/TILE_1   9.0  89.9    0.9    0.1    0.0      0.0 2750338  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_1/TILE_2   8.6  89.8    1.4    0.1    0.0      0.0 2750338  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_1/TILE_3   8.5  89.9    1.4    0.1    0.0      0.0 2750338  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_1/TILE_4   7.9  90.9    0.9    0.4    0.0      0.0 2750343  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_1/TILE_5   7.7  90.9    0.9    0.5    0.0      0.0 2750343  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_1/TILE_6   7.7  91.0    0.9    0.4    0.0      0.0 2750343  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_0/RDU_1/TILE_7   8.0  91.0    0.6    0.4    0.0      0.0 2750343  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_0/TILE_0   7.6  92.0    0.3    0.1    0.0      0.0 2750345  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_0/TILE_1   7.6  92.0    0.3    0.1    0.0      0.0 2750345  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_0/TILE_2   7.5  92.1    0.3    0.1    0.0      0.0 2750345  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_0/TILE_3   7.5  92.1    0.3    0.1    0.0      0.0 2750345  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_0/TILE_4   7.5  92.1    0.3    0.1    0.0      0.0 2750335  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_0/TILE_5   7.5  92.1    0.3    0.1    0.0      0.0 2750335  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_0/TILE_6   7.5  92.1    0.3    0.1    0.0      0.0 2750335  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_0/TILE_7   7.5  92.1    0.3    0.1    0.0      0.0 2750335  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_1/TILE_0   7.7  91.5    0.4    0.4    0.0      0.0 2750330  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_1/TILE_1   7.9  91.5    0.3    0.4    0.0      0.0 2750330  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_1/TILE_2   7.9  91.5    0.3    0.4    0.0      0.0 2750330  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_1/TILE_3   7.6  91.8    0.4    0.3    0.0      0.0 2750330  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_1/TILE_4   7.7  91.9    0.4    0.0    0.0      0.0 2750334  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_1/TILE_5   7.7  91.9    0.4    0.0    0.0      0.0 2750334  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_1/TILE_6   7.9  91.9    0.3    0.0    0.0      0.0 2750334  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_1/RDU_1/TILE_7   7.9  91.9    0.3    0.0    0.0      0.0 2750334  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_0/TILE_0   8.0  91.8    0.1    0.1    0.0      0.0 2750346  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_0/TILE_1   8.0  91.8    0.1    0.1    0.0      0.0 2750346  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_0/TILE_2   8.0  91.8    0.1    0.1    0.0      0.0 2750346  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_0/TILE_3   7.7  91.9    0.1    0.3    0.0      0.0 2750346  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_0/TILE_4   7.5  92.0    0.5    0.0    0.0      0.0 2750336  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_0/TILE_5   7.6  91.9    0.5    0.0    0.0      0.0 2750336  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_0/TILE_6   7.6  91.9    0.4    0.1    0.0      0.0 2750336  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_0/TILE_7   7.5  91.9    0.4    0.3    0.0      0.0 2750336  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_1/TILE_0   7.5  91.8    0.6    0.1    0.0      0.0 2750331  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_1/TILE_1   7.5  91.8    0.6    0.1    0.0      0.0 2750331  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_1/TILE_2   7.7  91.6    0.5    0.1    0.0      0.0 2750331  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_1/TILE_3   7.7  91.6    0.5    0.1    0.0      0.0 2750331  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_1/TILE_4   7.9  91.4    0.8    0.0    0.0      0.0 2750329  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_1/TILE_5   7.9  91.4    0.8    0.0    0.0      0.0 2750329  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_1/TILE_6   8.1  91.4    0.5    0.0    0.0      0.0 2750329  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_2/RDU_1/TILE_7   8.2  91.4    0.4    0.0    0.0      0.0 2750329  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_0/TILE_0   7.5  91.8    0.4    0.4    0.0      0.0 2750344  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_0/TILE_1   7.5  91.8    0.4    0.4    0.0      0.0 2750344  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_0/TILE_2   7.5  91.8    0.4    0.4    0.0      0.0 2750344  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_0/TILE_3   7.5  91.8    0.4    0.4    0.0      0.0 2750344  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_0/TILE_4   7.6  91.8    0.3    0.4    0.0      0.0 2750337  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_0/TILE_5   7.7  91.8    0.1    0.4    0.0      0.0 2750337  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_0/TILE_6   7.7  91.8    0.3    0.3    0.0      0.0 2750337  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_0/TILE_7   7.7  91.9    0.3    0.1    0.0      0.0 2750337  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_1/TILE_0   7.7  92.0    0.1    0.1    0.0      0.0 2750347  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_1/TILE_1   7.7  92.0    0.1    0.1    0.0      0.0 2750347  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_1/TILE_2   7.7  92.1    0.1    0.0    0.0      0.0 2750347  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_1/TILE_3   7.7  92.1    0.1    0.0    0.0      0.0 2750347  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_1/TILE_4   7.3  91.9    0.5    0.3    0.0      0.0 2750332  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_1/TILE_5   7.3  91.9    0.5    0.3    0.0      0.0 2750332  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_1/TILE_6   7.3  91.9    0.5    0.3    0.0      0.0 2750332  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n/XRDU_3/RDU_1/TILE_7   7.3  92.0    0.5    0.1    0.0      0.0 2750332  vsastry /opt/sambaflow/apps/nlp/transformers_on_rdu/venv/b\n

        The Slurm log associated with the JOBID (10191 in the above example) is located in the home directory. You can use the tail command to check the progress of the training.

        vsastry@sn30-r1-h1:~$ tail -f ~/slurm-10191.out\nUsing /data/ANL/results/sn30-r1-h1/vsastry/041823.03/Gpt1.5B.out for output\n
        vsastry@sn30-r1-h1:~$ tail -f /data/ANL/results/sn30-r1-h1/vsastry/041823.03/Gpt1.5B.out\n

        Once the run is completed, check the log file for the performance results.

        {'e2e_train_time': 2179.2292835712433, 'training_sequences_per_second': 192467.31088004305, 'final_loss': 4.781678199768066}\n247/3247 [01:03<00:00, 50.76it/s]\n
        "},{"location":"ai-testbed/sambanova/example-programs/","title":"Example Programs","text":"

        SambaNova provides examples of some well-known simple AI applications under the path: /opt/sambaflow/apps/starters, on all SambaNova compute nodes. Make a copy of this to your home directory:

        cd ~/\nmkdir apps\ncp -r /opt/sambaflow/apps/starters apps/starters\n

        Deactivate any active conda environment. If you have conda installed and a conda environment is active, you will see something like (base) at the beginning of the command prompt. If so, you will need to deactivate it with conda deactivate. Conda is not used on the SambaNova SN30 cluster.

        "},{"location":"ai-testbed/sambanova/example-programs/#lenet","title":"LeNet","text":"

        Change directory

        cd ~/apps/starters/lenet\n
        "},{"location":"ai-testbed/sambanova/example-programs/#common-arguments","title":"Common Arguments","text":"

        Below are some of the common arguments used across most of the models in the example code.

        Argument Default Help -b 1 Batch size for training -n, 100 Number of iterations to run --num-iterations the pef for -e, 1 Number epochs for training --num-epochs --log-path 'check Log path points' --num-workers 0 Number of workers --measure-train- None Measure training performance performance"},{"location":"ai-testbed/sambanova/example-programs/#lenet-arguments","title":"LeNet Arguments","text":"Argument Default Help --lr 0.01 Learning rate for training --momentum 0.0 Momentum value for training --weight-decay 0.01 Weight decay for training --data-path './data' Data path --data-folder 'mnist_ Folder containing mnist data data'

        Establish the Environment

        source /opt/sambaflow/apps/starters/lenet/venv/bin/activate\n

        Note: If you receive an \\\"HTTP error\\\" message on any of the following commands, run the command again. Such errors (e.g 503) are commonly an intermittent failure to download a dataset.

        Run these commands to compile and train the LeNet model:

        srun python lenet.py compile -b=1 --pef-name=\"lenet\" --output-folder=\"pef\"\nsrun python lenet.py run --pef=\"pef/lenet/lenet.pef\"\n

        Alternatively to use Slurm sbatch, create submit-lenet-job.sh with the following contents:

        #!/bin/sh\n\npython lenet.py compile -b=1 --pef-name=\"lenet\" --output-folder=\"pef\"\npython lenet.py run --pef=\"pef/lenet/lenet.pef\"\n

        Then

        sbatch --output=pef/lenet/output.log submit-lenet-job.sh\n

        Squeue will give you the queue status.

        squeue\n# One may also...\nwatch squeue\n

        One may see the run log using:

        cat pef/lenet/output.log\n
        "},{"location":"ai-testbed/sambanova/example-programs/#mnist-feed-forward-network","title":"MNIST - Feed Forward Network","text":"

        Establish the Environment

        source /opt/sambaflow/apps/starters/ffn_mnist/venv/bin/activate\n

        Change directory

        cd ~/apps/starters/ffn_mnist/\n

        Commands to run MNIST example:

        srun python ffn_mnist.py  compile -b 1 --pef-name=\"ffn_mnist\" --mac-v2\nsrun python ffn_mnist.py  run -b 1 -p out/ffn_mnist/ffn_mnist.pef\n

        To run the same using Slurm sbatch, create and run the submit-ffn_mnist-job.sh with the following contents.

        #!/bin/sh\npython ffn_mnist.py  compile -b 1 --pef-name=\"ffn_mnist\" --mac-v2\npython ffn_mnist.py  run -b 1 -p out/ffn_mnist/ffn_mnist.pef\n
        sbatch --output=pef/ffn_mnist/output.log submit-ffn_mnist-job.sh\n
        "},{"location":"ai-testbed/sambanova/example-programs/#logistic-regression","title":"Logistic Regression","text":"

        Establish the Environment

        source /opt/sambaflow/apps/starters/logreg/venv/bin/activate\n

        Change directory

        cd ~/apps/starters/logreg\n
        "},{"location":"ai-testbed/sambanova/example-programs/#logistic-regression-arguments","title":"Logistic Regression Arguments","text":"

        This is not an exhaustive list of arguments.

        Arguments

        Argument Default Help Step --lr 0.001 Learning rate for training Compile --momentum 0.0 Momentum value for training Compile --weight-decay 1e-4 Weight decay for training Compile --num-features 784 Number features for training Compile --num-classes 10 Number classes for training Compile --weight-norm na Enable weight normalization Compile

        Run these commands:

        srun python logreg.py compile --pef-name=\"logreg\" --output-folder=\"pef\"\nsrun python logreg.py run --pef=\"pef/logreg/logreg.pef\"\n

        To use Slurm, create submit-logreg-job.sh with the following contents:

        #!/bin/sh\npython logreg.py compile --pef-name=\"logreg\" --output-folder=\"pef\"\npython logreg.py run --pef=\"pef/logreg/logreg.pef\"\n

        Then

        sbatch --output=pef/logreg/output.log submit-logreg-job.sh\n

        The output, pef/logreg/output.log, will look something like this:

        2023-03-08 21:18:25.168190: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA\nTo enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n2023-03-08 21:18:25.334389: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\n2023-03-08 21:18:25.334430: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n2023-03-08 21:18:26.422458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n2023-03-08 21:18:26.422701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n2023-03-08 21:18:26.422709: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n[Info][SAMBA]# Placing log files in /home/wilsonb/apps/starters/logreg/pef/logreg/logreg.samba.log\n[Info][MAC]# Placing log files in /home/wilsonb/apps/starters/logreg/pef/logreg/logreg.mac.log\n...\n\nEpoch [1/1], Step [10000/60000], Loss: 0.4642\nEpoch [1/1], Step [20000/60000], Loss: 0.4090\nEpoch [1/1], Step [30000/60000], Loss: 0.3863\nEpoch [1/1], Step [40000/60000], Loss: 0.3703\nEpoch [1/1], Step [50000/60000], Loss: 0.3633\nEpoch [1/1], Step [60000/60000], Loss: 0.3553\nTest Accuracy: 91.40  Loss: 0.3014\n2023-03-08T21:19:08 : [INFO][LIB][2688517]: sn_create_session: PEF File: pef/logreg/logreg.pef\n
        "},{"location":"ai-testbed/sambanova/example-programs/#unet2d","title":"UNet2D","text":"

        The UNet application example is provided in the the path : /opt/sambaflow/apps/image/segmentation/. As any other application, we first compile and then train the model using compile and run arguments respectively. The scripts containing the compile and run commands for UNet2D model can be accessed at Unet2d.sh or at /data/ANL/scripts/Unet2d.sh on any SN30 compute node.

        Change directory and copy files.

        mkdir -p ~/apps/image/unet\ncd ~/apps/image/unet\n

        Copy and paste the contents of Unet2d.sh to a file with the same name into the current directory using your favorite editor.

        chmod +x Unet2d.sh\n

        Run these commands for training (compile + train):

        ./Unet2d.sh compile <image size> <batch_size> <num of instances> <RunID>\n./Unet2d.sh run <image size> <batch_size> <num of instances> <RunID>\n

        The compile and run arguments of the script can only be run with number of instances equal to 1, indicating that this is a simple 4 tile run without data parallel framework. For a image size of 256x256 and batch size 256 when running just 1 instance, the commands are provided as follows.

        ./Unet2d.sh compile 256 256 1 unet2d_single_compile\n./Unet2d.sh run 256 256 1 unet2d_single_run\n

        The above commands displays the file that contains the output for the execution of the above scripts, usually /data/ANL/results/<hostname>/<userid>/<RunID>/Unet2d.out

        If we inspect the compile and run commands for the UNet application provided in the script, we see that the application is compiled with --num-tiles 4, which means that the entire application fits on 4 tiles or half of a RDU. The pef generated from the compilation process of the above command is placed under out/Unet2d/unet_train_256_256_single_4 inside the current working directory.

        python ${UNET}/compile.py compile --mac-v2 --in-channels=3 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --num-tiles=4 --pef-name=unet_train_${BS}_${2}_single_${NUM_TILES} --output-folder=${OUTDIR}\n
        srun --nodelist $(hostname) python /opt/sambaflow/apps/image/segmentation//hook.py run --data-cache=${CACHE_DIR}  --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling  --min-throughput 395 --in-channels=3 --in-width=${2} --in-height=${2} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${2}_${BS}_single_${NUM_TILES} --pef=${OUTDIR}/unet_train_${BS}_${2}_single_${NUM_TILES}/unet_train_${BS}_${2}_single_${NUM_TILES}.pef\n

        The performance data is located at the bottom of log file.

        inner train loop time : 374.6789753437042 for 10 epochs, number of global steps: 130, e2e samples_per_sec: 88.82270474202953\n
        "},{"location":"ai-testbed/sambanova/example-programs/#gpt-15b","title":"Gpt 1.5B","text":"

        The Gpt 1.5B application example is provided in the the path : /opt/sambaflow/apps/nlp/transformers_on_rdu/. The scripts containing the compile and run commands for Gpt1.5B model can be accessed at the path /data/ANL/scripts/Gpt1.5B_base_single_compile.sh and /data/ANL/scripts/Gpt1.5B_base_single_run.sh on any SN30 compute node. This script is compiled and run for only 1 instance and the model fits on 4 tiles or half of a RDU. The scripts are provided for reference.

        Change directory and copy files.

        mkdir -p ~/apps/nlp/Gpt1.5B_single\ncd ~/apps/nlp/Gpt1.5B_single\n

        Copy and paste the contents of Gpt1.5B_base_single_compile.sh and Gpt1.5B_base_single_run.sh to a file with the same names into the current directory using your favorite editor.

        or copy the contents from /data/ANL/scripts/Gpt1.5B_base_single_compile.sh and /data/ANL/scripts/Gpt1.5B_base_single_run.sh.

        cp /data/ANL/scripts/Gpt1.5B_base_single_compile.sh ~/apps/nlp/Gpt1.5B_single/\ncp /data/ANL/scripts/Gpt1.5B_base_single_run.sh ~/apps/nlp/Gpt1.5B_single/\n

        Run the script with batch size as an argument(shown below with an example of 32).

        chmod +x Gpt1.5B_base_single_compile.sh \n./Gpt1.5B_base_single_compile.sh 32\n

        The Gpt1.5B_base_single_compile.sh script will internally call the Gpt1.5B_base_single_run.sh to perform the training. You can inspect the compile and run commands in the scripts to learn that this model trains with a batch size of 32 for 1 instance over 4 tiles. The human decision file and the compiler config file helps to optimize the compute and memory resources specific to this Gpt 1.5B model run.

        python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py compile --pef-name=GPT1.5B_base_single_32 --output-folder=/data/scratch/user/GPT1.5B_base_single_32 --module_name gpt2_pretrain --task_name clm --max_seq_length 1024 -b 32  --output_dir=/data/scratch/user/GPT1.5B_base_single_32/hf_gpt1dot5b_ss1k_gas_1_bs32  --overwrite_output_dir --do_train  --per_device_train_batch_size 32   --tokenizer_name gpt2 --model_name gpt2 --mac-v2 --non_split_head --mac-human-decision /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/mac_v2_overrides/gpt2_48_enc_full_recompute_training_spatialmapping_tiling16_clmerge_gm_pardp2_lnsd.json --compiler-configs-file /opt/sambaflow/apps/nlp/transformers_on_rdu/human_decisions_gm/compiler_configs/compiler_configs_gpt1dot5b_perf.json --skip_broadcast_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --no_index_select_patch --weight_decay 0.1  --max_grad_norm_clip 1.0 --num-tiles 4 --enable-stochastic-rounding\n
        COMMAND= /usr/local/bin/srun --mpi=pmi2 python /opt/sambaflow/apps/nlp/transformers_on_rdu/transformers_hook.py run  -b 32  --data_dir /data/ANL/ss1024 --pef=/data/scratch/user/GPT1.5B_base_single_32/GPT1.5B_base_single_32/GPT1.5B_base_single_32.pef --output_dir=/data/scratch/user/GPT1.5B_base_single_32/hf_gpt1dot5b_ss1k_gas_1_bs16 --module_name gpt2_pretrain --task_name clm --max_seq_length 1024  --overwrite_output_dir --do_train  --per_device_train_batch_size 32 --tokenizer_name gpt2 --model_name gpt2 --non_split_head --skip_broadcast_patch --no_index_select_patch --config_name /opt/sambaflow/apps/nlp/transformers_on_rdu/customer_specific/mv/configs/gpt2_config_xl_50260.json --max_grad_norm_clip 1.0 --skip_checkpoint --logging_steps 1 --max_steps 75000 --learning_rate 0.00025 --steps_this_run 100\n

        The sntilestat command shows that the application runs on 4 tiles as shown below.

        /XRDU_0/RDU_0/TILE_0   2.1  96.9    0.8    0.1    0.0      0.0 796481  user python /opt/sambaflow/apps/nlp/transformers_on_rdu/\n/XRDU_0/RDU_0/TILE_1   2.1  96.9    0.8    0.1    0.0      0.0 796481  user python /opt/sambaflow/apps/nlp/transformers_on_rdu/\n/XRDU_0/RDU_0/TILE_2   2.5  96.9    0.4    0.1    0.0      0.0 796481  user python /opt/sambaflow/apps/nlp/transformers_on_rdu/\n/XRDU_0/RDU_0/TILE_3   2.5  96.9    0.4    0.1    0.0      0.0 796481  user python /opt/sambaflow/apps/nlp/transformers_on_rdu/\n/XRDU_0/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0\n...\n
        "},{"location":"ai-testbed/sambanova/getting-started/","title":"Getting Started","text":""},{"location":"ai-testbed/sambanova/getting-started/#on-boarding","title":"On-Boarding","text":"

        SambaNova SN30 can be accessed using your ALCF account. See Get Started to request an account and for additional information.

        "},{"location":"ai-testbed/sambanova/getting-started/#setup","title":"Setup","text":""},{"location":"ai-testbed/sambanova/getting-started/#system-view","title":"System View","text":"

        Connection to a SambaNova node is a two-step process. The first step is to ssh to the login node. This step requires an MFA passcode for authentication - an eight-digit passcode generated by an app on your mobile device, e.g., MobilePASS+. The second step is to log in to a SambaNova node from the login node.

        "},{"location":"ai-testbed/sambanova/getting-started/#log-in-to-login-node","title":"Log in to Login Node","text":"

        Log in to the SambaNova login node from your local machine using the below command. This uses the MobilePASS+ token generated every time you log in to the system. This is the same passcode used to authenticate into other ALCF systems, such as Polaris, Theta and Cooley.

        In the examples below, replace ALCFUserID with your ALCF user id.

        ssh ALCFUserID@sambanova.alcf.anl.gov\nPassword: < MobilePASS+ code >\n

        Note: Use the ssh \"-v\" option in order to debug any ssh problems.

        "},{"location":"ai-testbed/sambanova/getting-started/#log-in-to-a-sambanova-node","title":"Log in to a SambaNova Node","text":"

        Once you are on the login node, a SambaNova node can be accessed using an alias, sn30-r[1-4]-h[1-2] where 'r' stands for the rack number, and 'h' stands for host. sn30-r1-h1 is the first host of the first rack.

        The 8 nodes are aliased as : sn30-r1-h1 , sn30-r1-h2, sn30-r2-h1, sn30-r2-h2, sn30-r3-h1, sn30-r3-h2, sn30-r4-h1, sn30-r4-h2.

        sn30-r1-h1 can be accessed as below.

        ssh sn30-r1-h1\n
        "},{"location":"ai-testbed/sambanova/getting-started/#sdk-setup","title":"SDK setup","text":"

        The required software environment (SambaFlow software stack and the associated environmental variables) for a SN30 node is set up automatically at login. This is unlike the SN10 where the environment had to be set up by each user.

        "},{"location":"ai-testbed/sambanova/job-queuing-and-submission/","title":"Job Queueing and Submission","text":""},{"location":"ai-testbed/sambanova/job-queuing-and-submission/#introduction","title":"Introduction","text":"

        SambaNova uses Slurm for job submission and queueing. Below are some of the important commands for using Slurm. For more information refer to Slurm Documentation.

        Note: Run the Python scripts using 'srun' or 'sbatch', to ensure that concurrent jobs do not interfere with each other.

        Note: There is just one scheduler for all of the SambaNova nodes.

        "},{"location":"ai-testbed/sambanova/job-queuing-and-submission/#srun","title":"SRun","text":"

        The Slurm command srun can be used to run individual Python scripts in parallel with other scripts on a cluster managed by Slurm. Examples of srun usage are shown below.

        Slurm will assign a nodelist/host to run a job if a host is not specified.

        Example:

        srun python lenet.py compile -b=1 --pef-name=\"lenet\" --output-folder=\"pef\"\nsrun python lenet.py run --pef=\"pef/lenet/lenet.pef\"\n

        You may specify which node/host on which to run a job.

        Reasons to specify a node list:

        • One wants to test a specific node to verify the function of the HW and SW (daily smoke tests do this)
        • The nodes are at different software levels and one wants to use a node that has the needed software level for one's application.

        Example:

        srun --nodelist=sn30-r1-h1 python lenet.py compile -b=1 --pef-name=\"lenet\" --output-folder=\"pef\"\n
        "},{"location":"ai-testbed/sambanova/job-queuing-and-submission/#sbatch","title":"SBatch","text":"

        Alternatively, these jobs can be submitted to the Slurm workload manager through a batch script by using the sbatch command. To do this, create a bash script (submit-lenet-job.sh here as an example) with the commands that you want to execute.

        #!/bin/sh\n\npython lenet.py compile -b=1 --pef-name=\"lenet\" --output-folder=\"pef\"\npython lenet.py run --pef=\"pef/lenet/lenet.pef\"\n

        Then pass the bash script as an input to the sbatch command as shown below.

        sbatch --output=pef/lenet/output.log submit-lenet-job.sh\n

        In case of the need to use multiple RDUs (2 in the example shown below), the sbatch command would be altered as:

        sbatch --gres=rdu:2 <your_script.sh>\n
        "},{"location":"ai-testbed/sambanova/job-queuing-and-submission/#squeue","title":"SQueue","text":"

        The squeue command provides information about jobs located in the Slurm scheduling queue.

        squeue\n
        "},{"location":"ai-testbed/sambanova/job-queuing-and-submission/#sinfo","title":"SInfo","text":"

        SInfo is used to view partition and node information for a system running Slurm.

        Here is a suggested command:

        sinfo -O AllocNodes, GresUsed, Gres, NodeList\n

        For more information, see SInfo.

        "},{"location":"ai-testbed/sambanova/job-queuing-and-submission/#scancel","title":"SCancel","text":"

        SCancel is used to signal or cancel jobs, job arrays, or job steps.

        scancel job_id\n
        "},{"location":"ai-testbed/sambanova/miscellaneous/","title":"Miscellaneous","text":""},{"location":"ai-testbed/sambanova/miscellaneous/#sdk-version","title":"SDK Version","text":"

        To find the SDK version, run the following commands

        # TODO\n(venv) ALCFUserID@sn30-r1-h1:~$ python\nPython 3.7.6 (default, Feb 18 2020, 21:28:31)\n[GCC 9.3.0] on linux\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>> import sambaflow\n>>> sambaflow.__version__\n'1.11.5'\n>>>\n
        "},{"location":"ai-testbed/sambanova/miscellaneous/#omp_num_threads","title":"OMP_NUM_THREADS","text":"

        The OMP_NUM_THREADS environment variable sets the number of threads to use for parallel regions.

        The value of this environment variable must be a list of positive integer values. The values of the list set the number of threads to use for parallel regions at the corresponding nested levels.

        For the SambaNova system it, is usually set to one.

        export OMP_NUM_THREADS=16\n
        "},{"location":"ai-testbed/sambanova/miscellaneous/#where-is-the-model","title":"Where is the Model?","text":"

        Two copies of the model are maintained. One in host CPU memory and one in RDU memory. They do not interfere with each other unless you explicitly sync the model/parameter in between using:

        SambaTensor.rdu() # Moves the CPU model to the RDU\nSambaTensor.cpu() # Moves the RDU model to the CPU\n

        In order to run the model on the CPU, you can simply use the PyTorch model as if there is no RDU. In order to run the model on RDU, you would need to use session.run().

        "},{"location":"ai-testbed/sambanova/miscellaneous/#useful-commands","title":"Useful Commands","text":""},{"location":"ai-testbed/sambanova/miscellaneous/#sn-configuration","title":"SN Configuration","text":"
        snconfig show Node static\n

        The snconfig utility shows the static configuration of the system. The configuration for the first node is as follows:

        ======================================================\n=======                NODE Info               =======\n======================================================\n=======                Static Info             =======\nTimestamp: 2023-03-16 17:00:04\nPlatform Name: DataScale SN30-8\nNode Name: NODE\n    Number of XRDUS: 4\n    XRDU Name: XRDU_0\n        Number of RDUS: 2\n        RDU name: RDU_0\n            Serial Number     : 205057B469B35895\n            Number of TILES: 8\n            TILE Name: TILE_0\n                Serial Number     : N/A\n            TILE Name: TILE_1\n                Serial Number     : N/A\n\n\n...\n\n\n                    Size              : 128.0 GB\n                    Serial Number     : 1F5BC22\n            DDR CH Name: DDRCH_6\n                Number of DIMMS: 1\n                DIMM Name: DIMM_L0\n                    Size              : 128.0 GB\n                    Serial Number     : 1F5BC99\n            DDR CH Name: DDRCH_7\n                Number of DIMMS: 1\n                DIMM Name: DIMM_M0\n                    Size              : 128.0 GB\n                    Serial Number     : 1F5BB68\n        Total XRDU_3 memory size (GB): 2048.0\n
        "},{"location":"ai-testbed/sambanova/miscellaneous/#sambanova-daemon-service","title":"SambaNova Daemon Service","text":"

        The following command checks if the SambaNova daemon service is running.

        systemctl status snd\n

        The output should look something like this:

        \u25cf snd.service - SN Devices Service\n     Loaded: loaded (/lib/systemd/system/snd.service; enabled; vendor preset: enabled)\n    Drop-In: /etc/systemd/system/snd.service.d\n             \u2514\u2500override.conf\n     Active: active (running) since Fri 2023-01-27 04:03:14 UTC; 1 months 18 days ago\n   Main PID: 5635 (snd)\n      Tasks: 9 (limit: 629145)\n     Memory: 156.8M\n     CGroup: /system.slice/snd.service\n             \u2514\u25005635 /opt/sambaflow/bin/snd\n\nWarning: some journal files were not opened due to insufficient permissions.\n
        "},{"location":"ai-testbed/sambanova/miscellaneous/#tile-status","title":"Tile status","text":"
        sntilestat\nwatch sntilestat\n

        The output shown below is when the system is completely idle.

        TILE                 %idle %exec %pload %aload %chkpt %quiesce    PID     USER COMMAND\n/XRDU_0/RDU_0/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_0/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_0/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_0/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_0/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_0/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_1/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_1/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_1/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_1/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_1/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_1/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_1/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_0/RDU_1/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_0/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_0/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_0/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_0/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_0/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_0/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_1/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_1/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_1/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_1/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_1/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_1/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_1/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_1/RDU_1/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_0/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_0/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_0/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_0/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_0/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_0/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_1/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_1/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_1/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_1/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_1/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_1/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_1/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_2/RDU_1/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_0/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_0/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_0/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_0/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_0/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_0/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_0/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_0/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_1/TILE_0 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_1/TILE_1 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_1/TILE_2 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_1/TILE_3 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_1/TILE_4 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_1/TILE_5 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_1/TILE_6 100.0   0.0    0.0    0.0    0.0      0.0\n/XRDU_3/RDU_1/TILE_7 100.0   0.0    0.0    0.0    0.0      0.0\n
        "},{"location":"ai-testbed/sambanova/miscellaneous/#finding-hung-tiles","title":"Finding Hung Tiles","text":"
        snconfig show Node dynamic | grep perfect\n
        "},{"location":"ai-testbed/sambanova/miscellaneous/#how-busy-is-the-system","title":"How busy is the system?","text":"

        Use one of

        top\nhtop\n
        "},{"location":"ai-testbed/sambanova/running-a-model-or-program/","title":"Running a Model/Program","text":"

        Note: Please be mindful of how you are using the system. For example, consider running larger jobs in the evening or on weekends

        Note: Please use only Slurm commands, i.e., srun and sbatch, to run your code. If you run your code directly using the 'python' command, it may cause conflicts on the system.

        Note: If you have conda installed and a conda environment is active, you will see something like (base) at the beginning of the command prompt. If so, you will need to deactivate it with conda deactivate. Conda is not used on the SambaNova SN30 cluster.

        "},{"location":"ai-testbed/sambanova/running-a-model-or-program/#introduction","title":"Introduction","text":"

        The SambaNova workflow includes the following main steps to run a model.

        1. Compile
        2. Run
        3. Test (optional)

        The system uses the Slurm job scheduler to schedule the jobs and manage the workload on the system. For more information on Slurm, see Job Queueing and Submission.

        Example Programs lists the different example applications with corresponding commands for each of the above steps.

        "},{"location":"ai-testbed/sambanova/running-a-model-or-program/#compile","title":"Compile","text":"

        Compiles the model and generates a .pef file. This file contains information on how to reconfigure the hardware, and map the compute and memory resources required to run an application on RDUs. The pef files are by default saved in the 'out' directory; the SambaNova documentation advises saving pef files in separate directories with the '--output-folder' option.

        It is necessary to re-compile only when the model changes, or parameters specific to the model graph change, including the batch size.

        Compile times can be significant. Compiling the UNet sample, for example, when using images of size 32x32 pixels, takes 358(s), and 1844(s) for images of size 256x256.

        The entire compile process is executed on the host and no RDUs are involved in the compile step.

        Example of compiling the LeNet application:

        srun python lenet.py compile -b=1 --pef-name=\"lenet\" --output-folder=\"pef\"\n

        where

        Argument Default Help -b 1 Batch size for training"},{"location":"ai-testbed/sambanova/running-a-model-or-program/#run","title":"Run","text":"

        As part of this step, the model is trained on the RDUs by passing in the PEF file and the training dataset. The location of the pef file generated in the compile step is passed as an argument to the run command. Below is the example of the run command that trains a LeNet model.

        srun python lenet.py run --pef=\"pef/lenet/lenet.pef\"\n

        The location of the pef file generated in the compile step is passed as an argument to the run command.

        "},{"location":"ai-testbed/sambanova/running-a-model-or-program/#test-optional","title":"Test (Optional)","text":"

        This command is used to run the model on both the host CPU and a SambaNova RDU. It compares the results from the CPU and RDU and will report if any discrepancies are found. Pass the pef file generated as part of the compile step as the input to this command.

        srun python lenet.py test --pef=\"pef/lenet/lenet.pef\"\n
        "},{"location":"ai-testbed/sambanova/sambatune/","title":"Profiling and performance tuning with SambaTune","text":"

        This section covers how to use the SambaTune profiling performance tuning tool, and the SambaTune UI for viewing the results.

        "},{"location":"ai-testbed/sambanova/sambatune/#_1","title":"SambaTune for profiling and performance tuning","text":"

        SambaTune uses a yaml file that describes how to profile an application. There are samples in /opt/sambaflow/sambatune/configs. This section shows how to run the simplest sample, a linear net.

        First, ssh into one of the nodes in the SN30 cluster. Next, start a slurm interative job reserving a full node (8 RDUs), for 8 hours (480 minutes):

        $ /usr/local/bin/srun --time=480 --gres=rdu:8 --pty bash\n
        Record the hostname:
        $ hostname\nsn30-r1-h1\n

        Next, set an environment variable indicating where the profiling information should be stored:

        export DUMP_ROOT=~/Sambatune\n

        If running a large model, the profiling information can be hundreds of gigabytes or more, and the DUMP_ROOT should be set to some location with more storage than your home directory (which has a quota). E.g. somewhere that you have write access to in /srv/projects

        Optionally, examine the sample yaml file. You will see that it has 5 top-level sections: app:, model-args:, compile-args:, run-args:, env:

        Next, run sambatune using a sample sambatune yaml configuration file. This sample command line requests profiling with the benchmark, instrument, and run modes.

        $ sambatune --modes benchmark instrument run -- /opt/sambaflow/sambatune/configs/linear_net.yaml\n

        This will take a while to run, particularly if the yaml for a larger model is used.

        Then, run sambatune_ui:

        $ export ST_PORT=8576\n$ sambatune_ui --directory $DUMP_ROOT/artifact_root/sambatune_gen --port $ST_PORT\n

        Copy the password shown (e.g. to your clipboard). The userid is always admin. The password is different for every sambatune_ui run.

        In a fresh console on your working machine where you will run the browser, set up a two-hop ssh tunnel to the target node. Replace the ALCFUserID in the ssh command line with your ALCF userid.

        $ export ST_PORT=8576\n$ ssh -L $ST_PORT:localhost:$ST_PORT ALCFUserID@sambanova.alcf.anl.gov  -t ssh -L $ST_PORT:localhost:$ST_PORT -N sn30-r1-h1\n

        Put localhost:8576 in the url bar of a Chrome-family browser. (Chrome, Brave, Vivaldi, Opera tested.) A login prompt for the sambatune ui should show. Enter admin and the password copied previously. You should now see the SambaTune UI.

        If the browser does not show a login prompt, or if any previous step complains about a port conflict, try another value for ST_PORT on both the target node and for the ssh tunnel command, e.g. 8577.

        See SambaNova's SambaTune documentation for more information about using SambaTune and the SambaTune UI. This section is a good starting point: Workflow overview

        When finished: - Break the ssh tunnel with ctrl-c or equivalent. - Stop the sambatune_ui server on the target node with ctrl-c or equivalent. - Exit the interactive slurm job to release the reserved resources.

        A disconnected job can be canceled by determining its job id with squeue -a and canceling the job with scancel <jobid>

        "},{"location":"ai-testbed/sambanova/system-overview/","title":"System Overview","text":""},{"location":"ai-testbed/sambanova/system-overview/#introduction","title":"Introduction","text":"

        The SambaNova DataScale SN30 system is architected around the next-generation Reconfigurable Dataflow Unit (RDU) processor for optimal dataflow processing and acceleration. The AI Testbed's SambaNova SN30 system consists of eight nodes in 4 full racks, each node featuring eight RDUs interconnected to enable model and data parallelism. SambaFlow, Sambanova's software stack, extracts, optimizes, and maps the dataflow graphs to the RDUs from standard machine learning frameworks like PyTorch.

        Below are some of the links to SambaNova documentation.

        SambaNova white paper: Accelerated Computing with a Reconfigurable Dataflow Architecture

        SN30 documentation: SambaNova Documentation

        "},{"location":"ai-testbed/sambanova/tunneling-and-forwarding-ports/","title":"Tunneling and Forwarding Ports","text":"

        Port forwarding is covered here. This is specifically for TensorBoard.

        "},{"location":"ai-testbed/sambanova/tunneling-and-forwarding-ports/#tensorboard-port-forwarding","title":"TensorBoard Port Forwarding","text":"

        This section describes the steps to be followed to set up port forwarding for applications, like TensorBoard, which runs on the SambaNova system and binds to one or more ports. This example uses 6006 and 16006 as port numbers. Using port numbers other than these may avoid collisions with other users.

        "},{"location":"ai-testbed/sambanova/tunneling-and-forwarding-ports/#from-your-local-machine","title":"From Your Local Machine","text":"

        Replace ALCFUserID with your ALCF User ID.

        Run

        # Forward a port number from sambanova.alcf.anl.gov to your local machine.\nssh -v -N -f -L localhost:16006:localhost:16006 ALCFUserID@sambanova.alcf.anl.gov\n...\nPassword: < MobilePass+ code >\n\n# Connect to sambanova.alcf.anl.gov\nssh ALCFUserID@sambanova.alcf.anl.gov\n...\nPassword: < MobilePass+ code >\n
        "},{"location":"ai-testbed/sambanova/tunneling-and-forwarding-ports/#from-sambanovaalcfanlgov","title":"From sambanova.alcf.anl.gov","text":"

        Below are the commands specific to sn30-r1-h1. You may replace sn30-r1-h1 with any other node when using the appropriate system.

        Run

        Note: The full name is sn30-r1-h1.ai.alcf.anl.gov and it may also be used.

        # Forward the port.\nssh -N -f -L localhost:16006:localhost:6006 ALCFUserID@sn30-r1-h1\n# Connect to the system.\nssh ALCFUserID@sn30-r1-h1\n
        "},{"location":"ai-testbed/sambanova/tunneling-and-forwarding-ports/#on-sn30-r1-h1","title":"On sn30-r1-h1","text":"

        Activate the venv appropriate to your project.

        Navigate to the appropriate directory for your model. Launch your model using srun or sbatch.

        cd /path/to/your/project\nsbatch --output=pef/my_model/output.log submit-my_model-job.sh\n
        "},{"location":"ai-testbed/sambanova/tunneling-and-forwarding-ports/#on-another-sn30-r1-h1-terminal-window","title":"On Another sn30-r1-h1 Terminal Window","text":"

        The SambaNova system has a bash shell script to setup the required software environment. This sets up the SambaFlow software stack, the associated environmental variables and activates a pre-configured virtual environment.

        Use the command appropriate for your environment.

        For example, if you are using LogReg:

        ALCFUserID@sn30-r1-h1:~$ source /opt/sambaflow/apps/starters/logreg/venv/bin/activate\n(venv) ALCFUserID@sn30-r1-h1:~$\n

        Navigate to the appropriate directory for your model.

        cd /path/to/your/project\ntensorboard --logdir /logs --port 6006\n
        "},{"location":"ai-testbed/sambanova/tunneling-and-forwarding-ports/#browser-on-local-machine","title":"Browser on Local Machine","text":"

        Then, navigate in your browser to, in this example, http://localhost:16006 on your local machine.

        "},{"location":"ai-testbed/sambanova/tunneling-and-forwarding-ports/#notes","title":"Notes","text":"

        Explanation of ssh command:

        -N : no remote commands\n\n-f : put ssh in the background\n\n-L <machine1>:<portA>:<machine2>:<portB> :\n\nThe full command line will forward <machine2>:<portB> (remote scope) to <machine1>:<portA> (local scope)\n

        Adapted from: How can I run Tensorboard on a remote server?

        "},{"location":"ai-testbed/sambanova/virtual-environment/","title":"Virtual Environments","text":""},{"location":"ai-testbed/sambanova/virtual-environment/#using-a-venv","title":"Using a Venv","text":"

        To create a virtual environment, one can use the --system-site-packages flag:

        python -m venv --system-site-packages my_env\nsource my_env/bin/activate\n
        "},{"location":"ai-testbed/sambanova/virtual-environment/#installing-packages","title":"Installing Packages","text":"

        Install packages in the normal manner such as:

        python3 -m pip install <package>\n

        For more details see Use pip for installing.

        To install a different version of a package that is already installed in one's environment, one can use:

        pip install --ignore-installed  ... # or -I\n
        "},{"location":"ai-testbed/sambanova/virtual-environment/#pre-built-sample-venv","title":"Pre-Built Sample Venv","text":"

        Each of the samples or application examples provided by SambaNova has its own pre-built virtual environment which can be readily used. They are present in the /opt/sambaflow/apps/ directory tree within each of the applications.

        Note: Conda is not supported on the SambaNova system.

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/","title":"CosmicTagger Conversion","text":"

        The intent of this page is to show conceptually how to convert a model to run on the SambaNova system. It is not necessary to convert CosmicTagger because it has already been converted and is located at CosmicTagger on the SambaNova branch. The original is located at CosmicTagger.

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#run-model-on-cpu","title":"Run Model on CPU","text":"

        The first step to converting a model is to verify that it runs on the CPU. This step has been verified for CosmicTagger.

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#configpy","title":"Config.py","text":"

        CosmicTagger can run on multiple machines. As such, it is necessary to specify the architecture that one is using. For example, CPU or GPU. The architecture is stored in the ComputeMode class.

        Edit src/config/config.py. Add RDU to the ComputeMode class.

        class ComputeMode(Enum):\n    CPU   = 0\n    #...\n    RDU   = 6\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#trainerpy","title":"Trainer.py","text":"

        Edit src/utils/torch/trainer.py.

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#import-sambanova-packages","title":"Import SambaNova Packages","text":"

        Insert the imports at the top of the file.

        SambaFlow is a complete software stack designed to take input from standard machine learning frameworks such as PyTorch and TensorFlow. SambaFlow automatically extracts, optimizes, and maps dataflow graphs onto RDUs.

        try:\n    from sambaflow import samba\n\n    import sambaflow.samba.utils as utils\n    from sambaflow.samba.utils.argparser import parse_app_args\n    from sambaflow.samba.utils.common import common_app_driver\nexcept:\n    pass\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#wrap-model","title":"Wrap Model","text":"

        Wrap the model using poptorch.trainingModel() so that it may be ran on IPUs for training.

        Wrap the model using poptorch.inferenceModel() when not training.

        Find the following code around line 90 in the init_network method.

                # Foregoing any fusions as to not disturb the existing ingestion pipeline\n        if self.is_training() and self.args.mode.quantization_aware:\n            self._raw_net.qconfig = torch.quantization.get_default_qat_qconfig('fbgemm')\n            self._net = torch.quantization.prepare_qat(self._raw_net)\n        else:\n            self._net = self._raw_net\n

        After the above code, add:

                if self.args.run.compute_mode == ComputeMode.IPU:\n            if self.is_training():\n                opts = poptorch.Options()\n                self._net = poptorch.trainingModel(self._net, opts, optimizer=torch.optim.SGD(self._net.parameters(), lr=1e-3))\n            else:\n                self._net = poptorch.inferenceModel(self._net)\n

        See poptorch.trainingModel() and poptorch.inferenceModel() for more information.

        There is also a Build the Model tutorial.

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#update-optimizer","title":"Update Optimizer","text":"

        Update init_optimizer() to use the poptorch class instead of the torch class as needed.

        Change:

                if self.args.mode.optimizer.name == OptimizerKind.rmsprop:\n            self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)\n        else:\n            self._opt = torch.optim.Adam(self._net.parameters(), 1.0)\n

        to:

                if self.args.mode.optimizer.name == OptimizerKind.rmsprop:\n            if self.args.run.compute_mode == ComputeMode.IPU:\n                self._opt = poptorch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)\n            else:\n                self._opt = torch.optim.RMSprop(self._net.parameters(), 1.0, eps=1e-4)\n        else:\n            if self.args.run.compute_mode == ComputeMode.IPU:\n                self._opt = poptorch.optim.Adam(self._net.parameters(), 1.0)\n            else:\n                self._opt = torch.optim.Adam(self._net.parameters(), 1.0)\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#update-the-forward-pass","title":"Update the Forward Pass","text":"

        Putting the loss calculation in forward_pass() allows the loss computation to be performed on the IPUs. This will be faster because the data will not need to be transfered round-trip to the CPU.

        Change forward_pass():

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#original","title":"Original","text":"
                    if net is None:\n                logits_image = self._net(minibatch_data['image'])\n            else:\n                logits_image = net(minibatch_data['image'])\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#updated","title":"Updated","text":"

        The following code changes are to account for the loss function, i.e., self.loss_calculator, and the image labels, i.e., labels_image, to be passed to the model's forward_pass method. Additionally, the calculated loss is returned from the forward_pass method.

                    if net is None:\n                if self.args.run.compute_mode == ComputeMode.IPU:\n                    logits_image, labels_image, loss = self._net(minibatch_data['image'], self.loss_calculator, labels_image)\n                    return logits_image, labels_image, loss\n                else:\n                    logits_image = self._net(minibatch_data['image'])\n            else:\n                if self.args.run.compute_mode == ComputeMode.IPU and self.args.mode.name != ModeKind.inference:\n                    logits_image, labels_image, loss = net(minibatch_data['image'], self.loss_calculator, labels_image)\n                    return logits_image, labels_image, loss\n                else:\n                    logits_image = net(minibatch_data['image'])\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#update-the-training-step","title":"Update the Training Step","text":"

        Receive the extra loss variable from the forward_pass method.

        Update the train_step method.

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#original-training-step","title":"Original Training Step","text":"
                            with self.timing_context(\"forward\"):\n                        if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:\n                            with torch.cuda.amp.autocast():\n                                logits_image, labels_image = self.forward_pass(minibatch_data)\n                        else:\n                            logits_image, labels_image = self.forward_pass(minibatch_data)\n\n                    verbose = False\n\n                    # Compute the loss based on the logits\n                    with self.timing_context(\"loss\"):\n                        loss = self.loss_calculator(labels_image, logits_image)\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#updated-training-step","title":"Updated Training Step","text":"

        The forward_pass() method was changed to return the extra variable loss in the previous section. It is now received conditionally when using an IPU(s).

        In the with self.timing_context(\"loss\"): section, only calculate loss if not using an IPU(s).

                            with self.timing_context(\"forward\"):\n                        if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:\n                            with torch.cuda.amp.autocast():\n                                logits_image, labels_image = self.forward_pass(minibatch_data)\n                        else:\n                            if self.args.run.compute_mode == ComputeMode.IPU:\n                                logits_image, labels_image, loss = self.forward_pass(minibatch_data)\n                            else:\n                                logits_image, labels_image = self.forward_pass(minibatch_data)\n\n                    verbose = False\n\n\n                    # Compute the loss based on the logits\n                    with self.timing_context(\"loss\"):\n                        if self.args.run.compute_mode == ComputeMode.IPU:\n                            loss = loss\n                        else:\n                            loss = self.loss_calculator(labels_image, logits_image)\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#update-validation-step","title":"Update Validation Step","text":"

        Update the val_step method.

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#original-validation-step-code","title":"Original Validation Step Code","text":"

        Find this code.

                    if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:\n                with torch.cuda.amp.autocast():\n                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)\n            else:\n                logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)\n\n            # Compute the loss based on the logits\n            loss = self.loss_calculator(labels_image, logits_image)\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#updated-validation-step-code","title":"Updated Validation Step Code","text":"

        Change the code to the following.

                    if self.args.run.precision == Precision.mixed and self.args.run.compute_mode == ComputeMode.GPU:\n                with torch.cuda.amp.autocast():\n                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)\n\n                    # Compute the loss based on the logits\n                    loss = self.loss_calculator(labels_image, logits_image)\n            else:\n                if self.args.run.compute_mode == ComputeMode.IPU:\n                    logits_image, labels_image, loss = self.forward_pass(minibatch_data, net=val_net)\n                else:\n                    logits_image, labels_image = self.forward_pass(minibatch_data, net=val_net)\n\n                    # Compute the loss based on the logits\n                    loss = self.loss_calculator(labels_image, logits_image)\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#uresnet2d-model","title":"UResNet2D Model","text":""},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#update-model","title":"Update Model","text":"

        The Graphcore system is more computationally efficient if the loss function is on the IPU. This is accomplished by using the loss function within the model's forward method.

        Edit src/networks/torch/uresnet2D.py.

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#update-the-forward-declaration","title":"Update the Forward Declaration","text":"

        Find the forward method.

        def forward(self, input_tensor):\n

        Update the argument list to include the loss function, i.e., loss_calculator and the image labels, i.e., labels_image.

        def forward(self, input_tensor, loss_calculator=None, labels_image=None):\n
        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#add-loss-calculation","title":"Add Loss Calculation","text":"

        Add the loss calculation just before the forward method returns.

                if loss_calculator is not None:\n\n            labels_image = labels_image.long()\n            labels_image = torch.chunk(labels_image, chunks=3, dim=1)\n            shape =  labels_image[0].shape\n            labels_image = [ _label.view([shape[0], shape[-2], shape[-1]]) for _label in labels_image ]\n\n            loss = loss_calculator(labels_image, x)\n            import poptorch\n            loss = poptorch.identity_loss(loss , reduction=\"mean\")\n            return x, labels_image, loss\n\n        # This return already exists.\n        return x\n

        The poptorch.identity_loss method takes a single PyTorch tensor and will backpropagate a gradient of ones through it. You may find an example at here

        "},{"location":"ai-testbed/sambanova/unused/cosmictagger-conversion/#binexecpy","title":"bin/exec.py","text":"

        The following is included for completeness. One will not likely find this in other code.

        Open bin/exec.py in your favorite editor. Change:

        @hydra.main(version_base=None, config_path=\"../src/config\", config_name=\"config\")\n

        to

        @hydra.main(config_path=\"../src/config\", config_name=\"config\")\n
        "},{"location":"ai-testbed/sambanova/unused/performance-tools/","title":"Performance Tools","text":""},{"location":"ai-testbed/sambanova/unused/performance-tools/#tile-status","title":"Tile Status","text":"
        sntilestat\nwatch sntilestat\n
        "},{"location":"ai-testbed/sambanova/unused/performance-tools/#measure-tflops","title":"Measure TFLOPs","text":"

        This is an example for measuring TFLOPs for Conv2D forward pass.

        elif args.command == 'run':\n    samba.session.run(inputs, section_types=['fwd'])\n    #samba.session.run(inputs, section_types=['bckwd'])\n    n_iters = 100\n    forward_pass_time = []\n    print(\"run starts\")\n    start_time_forward = time.time()\n    for loop in range(n_iters):\n        samba.session.run(inputs, section_types=['fwd'])\n        #samba.session.run(inputs, section_types=['bckwd'])\n        #samba.session.run(inputs, section_types=['fwd', 'bckwd'])\n    end_time_forward = time.time()\n    forward_pass_time.append(end_time_forward - start_time_forward)\n    print(\"run ends\")\n\n    w_0 = (args.w + 2*args.pad_w - args.s)/args.wstride + 1\n    h_0 = (args.h + 2*args.pad_h - args.r)/args.hstride + 1\n    tflops = 2 * (w_0*h_0) * args.s * args.r * args.c * args.k * args.n\n    tflops_forw = tflops/(sum(forward_pass_time)/n_iters/5)/(10**12) #tflops\n    print(tflops)\n    print(sum(forward_pass_time))\n    print(\"tflops: %f\"%tflops_forw)\n    print(\"SN,Training,%s,Conv2d_fwd,%d,100,1,%d,%d,%d,%d,%d,%d,%d,0.0,%f,None,%f,%f,%f\" % (\"dtype\", args.n, args.w, args.h, args.c, args.k, args.s, args.pad_w, args.wstride, (sum(forward_pass_time)/n_iters)/args.n, args.n/(sum(forward_pass_time)/n_iters), tflops_forw, (sum(forward_pass_time)/n_iters)/args.n))\n
        "},{"location":"ai-testbed/sambanova/unused/running-GPT2-multi-node/","title":"Running GPT-2 on Multiple Nodes","text":"

        This GPT-2 example is for 1.5B parameters on two (2) nodes. Each node has eight (8) RDUs for a total of sixteen (16) RDUs.

        "},{"location":"ai-testbed/sambanova/unused/running-GPT2-multi-node/#create-a-directory","title":"Create a Directory","text":"
        cd <path to desired directory>\nmkdir GPT1.5B\ncd GPT1.5B\n
        "},{"location":"ai-testbed/sambanova/unused/running-GPT2-multi-node/#establish-script","title":"Establish Script","text":"

        Using your favorite editor, create the file 'Gpt1.5B.sh'.

        Copy the contents of Gpt1.5B.sh.

        Make the script executable:

        chmod +x Gpt1.5B.sh\n
        "},{"location":"ai-testbed/sambanova/unused/running-GPT2-multi-node/#multiple-nodes","title":"Multiple Nodes","text":"

        Gpt1.5B.sh contains the sbatch command:

        /usr/local/bin/sbatch --output=${HOME}/slurm-%A.out --ntasks 32 --gres=rdu:1 --ntasks-per-node 16  --nodes 2 --cpus-per-task=8  /data/ANL/scripts/Gpt1.5B_run.sh ${1} >> ${OUTPUT_PATH} 2>&1\n

        The sbatch nodes argument specifies the number of nodes to use.

        nodes 2 Nodes to use.

        Additionally, here are the other sbatch arguments.

        --ntasks 32: This option specifies the number of tasks to be used in the job.

        ntasks-per-node 16: This option specifies the number of tasks per node.

        gres=rdu:1 Indicates the model fits on a single RDU.

        cpus-per-task=8 CPUs per task.

        "},{"location":"ai-testbed/sambanova/unused/running-GPT2-multi-node/#run","title":"Run","text":"

        The script accepts an optional first parameter to specify the log directory.

        Run the script:

        ./Gpt1.5B.sh <optional log directory>\n
        "},{"location":"ai-testbed/sambanova/unused/running-GPT2-multi-node/#output","title":"Output","text":"

        The output can be found at /data/ANL/results/$(hostname)/${USER}/${LOGDIR}/${MODEL_NAME}.out. The actual path will be displayed on the screen.

        "},{"location":"ai-testbed/sambanova/unused/running-GPT2/","title":"Running GPT2","text":"

        The Pile and OWT data are located in:

        /data/ANL/pile\n/data/ANL/openwebtext_ss2048\n
        "},{"location":"ai-testbed/sambanova/unused/running-bert-large-on-sn30/","title":"Running BERT-Large on SambaNova DataScale SN30-8","text":""},{"location":"ai-testbed/sambanova/unused/running-bert-large-on-sn30/#set-up","title":"Set Up","text":"

        Establish a test directory from which to work.

        mkdir $HOME/app-test\ncd $HOME/app-test\n

        Copy BertLarge.sh into your current directory.

        cp /data/ANL/scripts/BertLarge.sh .\n
        "},{"location":"ai-testbed/sambanova/unused/running-bert-large-on-sn30/#running-bert-large-options","title":"Running Bert Large Options","text":"

        Let's cover several options for executing the script.

        1. Basic
        sbatch --output=${HOME}/app-test/slurm-%A.out --cpus-per-task=128 --gres=rdu:16 BertLarge.sh\n
        1. Specify a Log File

        This is helpful if doing multiple runs and one wishes to specify a run ID. This bash script argument is optional. Place it at the very end of the command.

        Example:

        sbatch --output=${HOME}/app-test/slurm-%A.out --cpus-per-task=128 --gres=rdu:16 BertLarge.sh my_runID\n
        1. Specify Nodelist

        One may optionally specify a nodelist for sbatch. An example is to use hostname.

        sbatch --nodelist $(hostname) --output=${HOME}/app-test/slurm-%A.out --cpus-per-task=128 --gres=rdu:16 BertLarge.sh\n
        "},{"location":"ai-testbed/sambanova/unused/running-bert-large-on-sn30/#running-bert-large","title":"Running Bert Large","text":"

        Let's specify the log file and the nodelist.

        Run

        sbatch --nodelist $(hostname) --output=${HOME}/app-test/slurm-%A.out --cpus-per-task=128 --gres=rdu:16 BertLarge.sh\n
        "},{"location":"ai-testbed/sambanova/unused/running-bert-large-on-sn30/#output","title":"Output","text":"

        Display the slurm output. For example:

        cat slurm-9637.out\n

        The output will look something like:

        Using /data/ANL/results/sn30-r3-h1/userid/040423.19/BertLarge.out for output\n

        You may display that file. You may want to use less to do so because it is quite long.

        less /data/ANL/results/sn30-r3-h1/userid/040423.19/BertLarge.out\n

        The organization of the file is:

        1. System Status
        2. Compile (very long)
        3. Run
        4. System Status
        5. Run Duration
        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/","title":"SambaTune","text":""},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#notes","title":"Notes","text":"

        Rick 4/16/2023 [10:16 AM] /home/rweisner/sambatune_ui_dir contains the 1.15.3 version which is the latest released version. It should work on your experimental. You will need browser access to wherever you install it.

        cd /home/rweisner/tmp/uno_test\n
        #TODOBRW\nssh wilsonb@homes.cels.anl.gov\nssh sm-02\nMobilePass+ password\nOn sm-02\nsource /opt/sambaflow/venv/bin/activate\nexport PATH=/opt/sambaflow/bin:$PATH\nsambatune linear_net.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run\nsambatune_ui --directory /home/wilsonb/tmp/sambatune_gen --port 8580\n#There will be a username and password displayed that you will use in your browser on your laptop.\nCommand used on laptop for port forward\nssh -XL 8580:127.0.0.1:8580 wilsonb@sm-02.cels.anl.gov\nMobilePass+ password\n# You will be logged into sm-02 but, you do not need to do anything.\naddress used in browser on laptop localhost:8580\n#Use username and password from sambatune_ui.\nUsername\nPassword\n\n#TODOBRW\n/home/wilsonb/DL/Sambanova/apps_1.12/private/anl/2022-09-21T19-21-05.html\n
        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#about-sambatune","title":"About SambaTune","text":"

        SambaTune is a tool for profiling, debugging, and tuning the performance of applications running on SN hardware.

        The tool automates the collection of hardware performance counters, metrics aggregation, report generation, and visualization. It also automates benchmarking of the application to compute average throughput over a sufficient number of runs. The tool is designed to aid the user with performance bottleneck analysis and tuning.

        SambaTune is currently used by SN engineers involved in performance tuning efforts. SambaTune is also planned for release to external customers to aid with performance bottleneck analysis and resolution.

        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#run-sambatune","title":"Run SambaTune","text":"
        ssh ALCFUserID@sambanova.alcf.anl.gov\n# Enter MobilePass+ pass code\nssh sm-01\n
        #TODOBRW\nssh wilsonb@sambanova.alcf.anl.gov\n# Enter MobilePass+ pass code\nssh sm-01\n

        First, enter the virtual environment on sm-01 or sm-02:

        source /opt/sambaflow/venv/bin/activate\n

        Update path:

        export PATH=/opt/sambaflow/bin:$PATH\n
        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#usage","title":"Usage","text":"
        usage: sambatune [-h] [--artifact-root ARTIFACT_ROOT] [--disable-override]\n                 [--compile-only | -m MODES [MODES ...]] [--version]\n                 config\n\npositional arguments:\n  config                YAML file with model, compile, run configuration.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --artifact-root ARTIFACT_ROOT\n                        Custom location to save compile/run artifacts;\n                        defaults to '$DUMP_ROOT/artifact_root' (default: None)\n  --disable-override    Reuse the placement from the baseline compilation\n                        (default: False)\n  --compile-only        Run compilation of PEFs for selected modes only\n                        (default: False)\n  -m MODES [MODES ...], --modes MODES [MODES ...]\n                        Select modes to execute from ['benchmark',\n                        'instrument', 'run'] (default: ['benchmark'])\n  --version             version of sambatune and sambaflow.\n
        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#command-overview","title":"Command Overview","text":"

        By default, it will run with the benchmarking mode enabled. Use the --modes flag to run modes individually or in any combination. Benchmark-Only:

        sambatune example_net.yaml --artifact-root $(pwd)/artifact_root --modes benchmark\n

        Instrument-Only:

        sambatune example_net.yaml --artifact-root $(pwd)/artifact_root --modes instrument\n

        All modes:

        sambatune example_net.yaml --artifact-root $(pwd)/artifact_root --modes instrument\n
        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#command-example","title":"Command Example","text":"
        # From Bill\npython /opt/sambaflow/apps/private/anl/uno_full.py compile --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --pef-name=uno_16_4_500_ws --output-folder=/home/arnoldw//models_dir/1520847 --mac-v1\n\npython /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial --pef=/home/arnoldw//models_dir/1520847/uno_16_4_500_ws/uno_16_4_500_ws.pef --in_dir /var/tmp/raw/ --mac-v1\n
        # From Bill --> Bruce\npython /opt/sambaflow/apps/private/anl/uno_full.py compile --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --pef-name=uno_16_4_500_ws --output-folder='.' --mac-v1\n\nexport OMP_NUM_THREADS=1\npython /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial --pef=./uno_16_4_500_ws/uno_16_4_500_ws.pef --in_dir /var/tmp/raw/ --mac-v1\n
        #TODOBRW  This works.  9/19/22\nsm-01/home/wilsonb/tmp/uno_test/uno_ccle.yaml\napp: /opt/sambaflow/apps/private/anl/uno_full.py\n\nmodel-args: --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial\n\ncompile-args: compile --plot --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --mac-v1\n\nrun-args: --multiprocess-pickle --use-pickle-train  --measure-spatial --train-samba-spatial --mac-v1 --train_source CCLE --lr 0.001 --data-dir /software/sambanova/dataset/CCLE_16_500 --converted-pickle\n\nenv:\n     OMP_NUM_THREADS: 16,\n     SF_RNT_NUMA_BIND: 2\n

        Run the following example:

        sambatune uno_ccle.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run\n
        #TODOBRW\n# Stand-alone\nexport UNO=.\nexport NS=500\nsrun python /opt/sambaflow/apps/private/anl/uno_full.py compile --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --pef-name=uno_16_4_${NS}_ws --output-folder='.' --mac-v1\n\nexport OMP_NUM_THREADS=1\nsrun python /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./uno_16_4_${NS}_ws/uno_16_4_${NS}_ws.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --data-dir /software/sambanova/dataset/CCLE_16_${NS}\n\nexport UNO=.\nexport NS=500\nexport OMP_NUM_THREADS=1\nsrun pyinstrument /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./uno_16_4_${NS}_ws/uno_16_4_${NS}_ws.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --data-dir /software/sambanova/dataset/CCLE_16_${NS} > pyinstrument_1.13.log 2>&1\n\n\n\nRicks run python ${UNO}/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=\u201cout/uno_16_4_${NS}/uno_16_4_${NS}.pef\u201d --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE\n
        #TODOBRW\nsm-01/home/wilsonb/DL/Sambanova/apps_1.12/private/anl/uno_brw_CCLE_1_12.yaml\nexport OMP_NUM_THREADS=16\napp: /home/wilsonb/DL/Sambanova/apps_1.12/private/anl/uno_full.py\n\nmodel-args: --weight-sharing -b 16 -mb 4 --num-spatial-batches 500 --mapping spatial\n\ncompile-args: compile --plot --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --mac-v1\n\nrun-args: --measure-spatial --train-samba-spatial --mac-v1 --train_source CCLE --lr 0.001 --data-dir /software/sambanova/dataset/CCLE_16_500\n\nenv:\n     OMP_NUM_THREADS: 16,\n     SF_RNT_NUMA_BIND: 2\n

        Run the following example:

        sambatune uno_brw_CCLE_1_12.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run\n\nexport UNO=.\nexport NS=50\nexport OMP_NUM_THREADS=1\n\nsrun python /opt/sambaflow/apps/private/anl/uno_full.py compile --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --mac-v1\n\nxsrun pyinstrument /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./uno_16_4_${NS}_ws/uno_16_4_${NS}_ws.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --data-dir /software/sambanova/dataset/CCLE_16_${NS} --epochs 1 > my.log 2>&1\n\nsrun python /opt/sambaflow/apps/private/anl/uno_full.py run --multiprocess-pickle  --measure-spatial --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./out/uno_full_16_47_${NS}/uno_full_16_47_${NS}.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --lr 0.001 --data-dir /software/sambanova/dataset/CCLE_16_${NS} > pyinstrument_1.13.log 2>&1\n\ncat my.log # Has pyinstrument run name.\npyinstrument --load-prev 2022-09-21T19-21-05 -r html\n\n\n1.13\n\nsource /opt/sambaflow/venv/bin/activate\ncd ~/tmp/uno_test/\nexport UNO=.\nexport NS=500\nexport OMP_NUM_THREADS=1\nexport PATH=/opt/sambaflow/bin:$PATH\nsntilestat\n\n\n\n./uno_pickl.sh compile 500\n./uno_pickl.sh run 500\n
        sambatune uno_brw_CCLE_1_12.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run\n\nexport UNO=.\nexport NS=50\nexport OMP_NUM_THREADS=1\n\nsrun python /opt/sambaflow/apps/private/anl/uno_full.py compile --mac-human-decision /opt/sambaflow/apps/private/anl/samba_uno/human_decisions_spatial.json --mac-v1\n\nxsrun pyinstrument /opt/sambaflow/apps/private/anl/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./uno_16_4_${NS}_ws/uno_16_4_${NS}_ws.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --data-dir /software/sambanova/dataset/CCLE_16_${NS} --epochs 1 > my.log 2>&1\n\nsrun python /opt/sambaflow/apps/private/anl/uno_full.py run --multiprocess-pickle  --measure-spatial --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=./out/uno_full_16_47_${NS}/uno_full_16_47_${NS}.pef --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE --lr 0.001 --data-dir /software/sambanova/dataset/CCLE_16_${NS} > pyinstrument_1.13.log 2>&1\n\ncat my.log # Has pyinstrument run name.\npyinstrument --load-prev 2022-09-21T19-21-05 -r html\n\n\n1.13\n\nsource /opt/sambaflow/venv/bin/activate\ncd ~/tmp/uno_test/\nexport UNO=.\nexport NS=500\nexport OMP_NUM_THREADS=1\nexport PATH=/opt/sambaflow/bin:$PATH\nsntilestat\n

        uno_pickl.sh

        #! /bin/bash -x\n#set -e\nsource /opt/sambaflow/venv/bin/activate\nSECONDS=0\nNS=${2}\nUNO=/opt/sambaflow/apps/private/anl/\nDS=\"ALL\"\nDS=\"CCLE\"\n\nBS=$((NS*16))\nexport OMP_NUM_THREADS=16\n\necho \"Model: UNO_SPA_TRN\"\necho \"Date: \" $(date +%m/%d/%y)\necho \"Time: \" $(date +%H:%M)\nif [ \"${1}\" == \"convert\" ] ; then\npython3 ${UNO}/uno/uno_data_loaders_converted.py   --in_dir /var/tmp/raw/ --out_dir /software/sambanova/dataset/${DS}_16_${NS}  --batch-size ${BS} --train_sources ${DS} --file-write-frequency 10\n\n\nelif [ \"${1}\" == \"compile\" ] ; then\n  echo \"COMPILE\"\n  python ${UNO}/uno_full.py compile --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --mac-human-decision ${UNO}/samba_uno/human_decisions_spatial.json --pef-name=\"uno_16_4_${NS}\" --mac-v1\n\n\nelif [ \"${1}\" == \"run\" ] ; then\n  echo \"RUN ${DS}\"\n  SF_RNT_NUMA_BIND=2\n  #python ${UNO}/uno_full.py run --acc-test --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=\"out/uno_16_4_${NS}/uno_16_4_${NS}.pef\" --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE\n  python ${UNO}/uno_full.py run --mac-v1 --multiprocess-pickle --use-pickle-train --train-samba-spatial -b 16 -mb 4 --num-spatial-batches ${NS} --lr 0.001 --mapping spatial --data-dir /software/sambanova/dataset/${DS}_16_${NS} --converted-pickle --train_sources ${DS} --pef=\"out/uno_16_4_${NS}/uno_16_4_${NS}.pef\" --epochs 1\n  #python ${UNO}/uno_full.py run --mac-v1 --multiprocess-pickle --use-pickle-train --train-samba-spatial -b 16 -mb 4 --num-spatial-batches ${NS} --lr 0.001 --mapping spatial --data-dir /software/sambanova/dataset/${DS}_16_${NS} --converted-pickle --train_sources ${DS} --pef=\"out/uno_16_4_${NS}/uno_16_4_${NS}.pef\"\n\nelif [ \"${1}\" == \"pyinstrument\" ] ; then\n  echo \"RUN ${DS}\"\n  SF_RNT_NUMA_BIND=2\n  #python ${UNO}/uno_full.py run --acc-test --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=\"out/uno_16_4_${NS}/uno_16_4_${NS}.pef\" --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE\n  pyinstrument ${UNO}/uno_full.py run --mac-v1 --multiprocess-pickle --use-pickle-train --train-samba-spatial -b 16 -mb 4 --num-spatial-batches ${NS} --lr 0.001 --mapping spatial --data-dir /software/sambanova/dataset/${DS}_16_${NS} --converted-pickle --train_sources ${DS} --pef=\"out/uno_16_4_${NS}/uno_16_4_${NS}.pef\" --epochs 1\n  #python ${UNO}/uno_full.py run --mac-v1 --multiprocess-pickle --use-pickle-train --train-samba-spatial -b 16 -mb 4 --num-spatial-batches ${NS} --lr 0.001 --mapping spatial --data-dir /software/sambanova/dataset/${DS}_16_${NS} --converted-pickle --train_sources ${DS} --pef=\"out/uno_16_4_${NS}/uno_16_4_${NS}.pef\"\n\nelif [ \"${1}\" == \"no_pickle\" ] ; then\n  echo \"no_pickle ${DS}\"\n  SF_RNT_NUMA_BIND=2\n  python ${UNO}/uno_full.py run --train-samba-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=\"out/uno_16_4_${NS}/uno_16_4_${NS}.pef\" --in_dir /var/tmp/raw/ --mac-v1 --train_source CCLE\n\nelif [ \"${1}\" == \"mp\" ] ; then\necho \"Duration: \" $SECONDS\n\nelif [ \"${1}\" == \"mp\" ] ; then\necho \"Duration: \" $SECONDS\necho \"PERF\"\npython uno_full.py measure-performance --measure-spatial --weight-sharing -b 16 -mb 4 --num-spatial-batches ${NS} --mapping spatial --pef=\"out/uno_16_4_${NS}/uno_16_4_${NS}.pef\" --num-iterations 20 --mac-v1\nfi\n\necho \"Duration: \" $SECONDS\n
        ./uno_pickl.sh compile 500\n./uno_pickl.sh run 500\n./uno_pickl.sh pyinstrument 500\npyinstrument --load-prev 2022-09-22T18-31-24 -r html\nstdout is a terminal, so saved profile output to /tmp/tmpeo5ehksn.html\ncp /tmp/tmpeo5ehksn.html .\n

        On dev terminal

        scp wilsonb@sambanova.alcf.anl.gov:tmp/uno_test/tmpeo5ehksn.html .\n

        View in local browser.

        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#running","title":"Running","text":"

        Create a directory for your work.

        mkdir ~/sambatune\ncd ~/sambatune\n

        Create small_vae.yaml with the following content using your favorite editor.

        app: /opt/sambaflow/apps/private/anl/moleculevae.py\n\nmodel-args: -b 128 --in-width 512 --in-height 512\n\ncompile-args: compile --plot --enable-conv-tiling --compiler-configs-file /opt/sambaflow/apps/private/anl/moleculevae/compiler_configs_conv.json --mac-v2 --mac-human-decision /opt/sambaflow/apps/private/anl/moleculevae/symmetric_human_decisions_tiled_v2.json\n\nrun-args: --input-path /var/tmp/dataset/moleculevae/ras1_prot-pops.h5 --out-path ${HOME}/moleculevae_out --model-id 0 --epochs 10\n\nenv:\n     OMP_NUM_THREADS: 16\n     SF_RNT_FSM_POLL_BUSY_WAIT: 1\n     SF_RNT_DMA_POLL_BUSY_WAIT: 1\n     CONVFUNC_DEBUG_RUN: 0\n

        Run the following example:

        sambatune small_vae.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run\n

        Create linear_net.yaml with the following content using your favorite editor.

        app: /opt/sambaflow/apps/micros/linear_net.py\n\nmodel-args: >\n  -b 1024\n  -mb 64\n  --in-features 8192\n  --out-features 4096\n  --repeat 128\n  --inference\n\ncompile-args: >\n  --n-chips 2\n  --plot\n\nenv:\n  SF_RNT_FSM_POLL_BUSY_WAIT: 1\n  SF_RNT_DMA_POLL_BUSY_WAIT: 1\n  CONVFUNC_DEBUG_RUN\": 0\n

        NOTE: The following takes 45 minutes to run.

        Run the following example:

        sambatune linear_net.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run\n
        #TODOBRW\ncd ~/tmp/uno_test\nscreen\nsambatune uno.yaml --artifact-root $(pwd)/artifact_root --modes benchmark instrument run\n

        where linear_net.yaml is a user-specified configuration file you created above.

        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#sambatune-ui","title":"SambaTune UI","text":""},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#port-availability","title":"Port Availability","text":"

        It is recommended that you check if the port you want to use is available. You may check by:

        ps -elf | grep desired_port\n

        Example:

        ps -elf | grep 8576\n

        Alternatively, you may check for all ports in use by sambatune_ui:

        ps -elf | grep sambatune_ui\n

        If you need to free a port that you are finished with, you may use the kill command.

        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#start-sambatune-ui","title":"Start SambaTune UI","text":"

        If you followed the above directions, your artifact_root will be at ~/sambatune/artifact_root.

        Start the UI:

        It will tell you the username and password.

        NOTE: It is recommended to use a port other than 8576 in case someone else is using it. Select another port close to 8576.

        Next

        sambatune_ui --directory ~/sambatune/artifact_root/sambatune_gen/ --port 8576\n
        #TODOBRW\nsambatune_ui --directory ~/sambatune/artifact_root/sambatune_gen/ --port 8580\nsambatune_ui --directory /home/wilsonb/tmp/uno_test/artifact_root/sambatune_gen --port 8580\nusername: \"admin\", password: \"4f7cac2c-351e-11ed-93a3-f7ef9c6e5d46\"\nusername: \"admin\", password: \"aaf1fc88-35c8-11ed-93a3-f7ef9c6e5d46\"\nusername: \"admin\", password: \"bf64e4f8-3831-11ed-93a3-f7ef9c6e5d46\"\nusername: \"admin\", password: \"8feca89e-384c-11ed-93a3-f7ef9c6e5d46\"\nusername: \"admin\", password: \"355222d6-3a88-11ed-93a3-f7ef9c6e5d46\"\n

        You will see something like:

        with the,\n    username: \"admin\", password: \"05c63938-2941-11ed-93a3-f7ef9c6e5d46\"\n[2022-08-31 15:24:36 +0000] [1344959] [Info] Starting gunicorn 20.1.0\n[2022-08-31 15:24:36 +0000] [1344959] [Info] Listening at: http://0.0.0.0:8576 (1344959)\n[2022-08-31 15:24:36 +0000] [1344959] [Info] Using worker: sync\n[2022-08-31 15:24:36 +0000] [1345092] [Info] Booting worker with pid: 1345092\n[2022-08-31 15:24:36 +0000] [1345093] [Info] Booting worker with pid: 1345093\n

        NOTE: Write down the username and password.

        NOTE: The password only works with this one instance of sambatune_ui. If you stop this instance of sambatune_ui and start another instance, it will have a new password.

        NOTE: You will need to > or use the kill command to stop sambatune_ui when you have finished. Not doing so will tie up the port. You can ps -elf | grep the_port_you_used to find the running processes. If you are not comfortable doing this, please ask for help."},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#use-port-forwarding","title":"Use Port-Forwarding","text":"

        This describes the steps to set up port-forwarding for applications, like SambaTune UI, which runs on the SambaNova system and binds to one or more ports. This example uses 8576 and 18576 as port numbers. Using port numbers other than these may avoid collisions with other users.

        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#from-your-local-machine","title":"From your local machine","text":"

        This command sets up a port forward SambaNova login node to your local machine.

        Run

        ssh -N -f -L localhost:18576:localhost:18576 ALCFUserID@sambanova.alcf.anl.gov\n...\nPassword: < MobilePass+ code >\n\nssh ALCFUserID@sambanova.alcf.anl.gov\n
        #TODOBRW\nssh -v -N -f -L localhost:8580:localhost:8580 wilsonb@sambanova.alcf.anl.gov\nssh -N -f -L localhost:8580:localhost:8580 wilsonb@sambanova.alcf.anl.gov\n...\nPassword: < MobilePass+ code >\n\nssh wilsonb@sambanova.alcf.anl.gov\n

        replacing ALCFUserID with your ALCF User ID.

        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#from-sambanovaalcfanlgov","title":"From sambanova.alcf.anl.gov","text":"

        This command sets up a port forward from a SambaNova node to the sambanova login machine.

        Below are the commands specific to sm-01. You may replace sm-01 with sm-02 when using that system.

        Run

        NOTE: The full name is sm-01.ai.alcf.anl.gov and it may also be used.

        ssh -N -f -L localhost:18576:localhost:8576 ALCFUserID@sm-01\n
        #TODOBRW\nssh -N -f -L localhost:8580:localhost:8580 wilsonb@sm-01\n
        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#browser-on-local-machine","title":"Browser on Local Machine","text":"

        Then, navigate in your browser to, in this example, http://localhost:18576 on your local machine.

        Use the username and password from sm-01 to log in.

        "},{"location":"ai-testbed/sambanova/unused/sambatune-user-guide/#ssh-notes","title":"SSH Notes","text":"

        Explanation of ssh command:

        -N : no remote commands\n\n-f : put ssh in the background\n\n-L <machine1>:<portA>:<machine2>:<portB> :\n\nThe full command line will forward <machine1>:<portA> (local scope) to <machine2>:<portB> (remote scope)\n

        Adapted from: How can I run Tensorboard on a remote server?

        "},{"location":"aurora/getting-started-on-aurora/","title":"Getting Started on Aurora","text":""},{"location":"aurora/getting-started-on-aurora/#overview","title":"Overview","text":"

        *** ACCESS IS CURRENTLY ENABLED FOR ESP and ECP TEAM ONLY ***

        The pre-requisites required for Sunspot are applicable to Aurora as well. See this page for more information.

        NOTE: Sharing of any results from Aurora publicly no longer requires a review or approval from Intel. However, anyone publishing these results should include the following in their materials:

        \"This work was done on a pre-production supercomputer with early versions of the Aurora software development kit.\"

        In addition, users should acknowledge the ALCF. Refer to the acknowledgement policy page for details. Please note that certain information on Aurora hardware and software is considered NDA and cannot be shared publicly.

        Aurora is in the very early stages of the system deployment - do not expect a production environment!

        Expect to experience:

        • Hardware instabilities - possible frequent downtime
        • Software instabilities - non-optimized compilers, libraries and tools; frequent software updates
        • Non-final configurations (storage, OS versions, etc...)
        • Short notice for downtimes (scheduled downtimes will be with 4 hr notice, but sometimes downtimes may occur with just an email notice). Notices go to the aurora-notify@alcf.anl.gov email list. All users with access are added to the list initially.
        "},{"location":"aurora/getting-started-on-aurora/#getting-help","title":"Getting Help","text":"

        Email ALCF support at support@alcf.anl.gov for bugs, technical questions, software requests, reservations, priority boosts, etc...

        • ALCF's user support team will triage and forward the tickets to the appropriate technical SME as needed.
        • Expect turnaround times to be slower than on a production system as the technical team will be focused on stabilizing and debugging the system.

        For faster assistance, consider contacting your project's POC at ALCF (project catalyst or liaison)

        • They are an excellent source of assistance during this early period and will be aware of common bugs and known issues.

        ECP and ESP users will be added to a CNDA Slack workspace, where CNDA discussions may occur. An invite to the Slack workspace will be sent when a user is added to the Aurora resource.

        "},{"location":"aurora/getting-started-on-aurora/#known-issues","title":"Known Issues","text":"

        A known issues page can be found in the JLSE Wiki space used for NDA content. Note that this page requires a JLSE Aurora early hw/sw resource account for access.

        • Interim Filesystem: The early access filesystem is not highly performant. Intermittent hangs or pauses should be expected - waiting for IO to complete is recommended and IO completions should pass without failure. Jobs requiring significant filesystem performance must be avoided at this time.
        • Large number of Machine Check Events from the PVC, that causes nodes to panic and reboot.
        • HBM mode is not automatically validated. Jobs requiring flat memory mode should test by looking at numactl -H for 4 NUMA memory nodes instead of 16 on the nodes.
        "},{"location":"aurora/getting-started-on-aurora/#allocation-usage","title":"Allocation usage","text":"

        The allocation accounting system sbank is not yet installed on Aurora.

        To obtain the usage information for all your projects on Aurora, issue the sbank command on another ALCF resource where sbank is installed, such as Polaris.

        $ sbank-list-allocations -r aurora\n

        For more information, see this page.

        "},{"location":"aurora/getting-started-on-aurora/#transition-to-aurora-from-sunspot","title":"Transition to Aurora from Sunspot","text":"

        Some guidance is provided here to aid users in the process of moving their work from the Sunspot Test & Development System.

        "},{"location":"aurora/getting-started-on-aurora/#logging-into-aurora","title":"Logging Into Aurora","text":"

        Logging into Aurora is a two-stage process. You must first login through the bastion node via:

        ssh <username>@bastion.alcf.anl.gov\n
        Then, type in the one-timepassword from your CRYPTOCard/MobilePASS+ token.

        This bastion node is a pass-through erected for security purposes, and is not meant to host files. Once on the bastion, SSH to login.aurora.alcf.anl.gov. It is round robin to the aurora login nodes.

        ssh <username>@login.aurora.alcf.anl.gov\n
        "},{"location":"aurora/getting-started-on-aurora/#proxies-for-outbound-connections-git-ssh-etc","title":"Proxies for outbound connections: Git, ssh, etc...","text":"

        The Aurora login nodes don't currently have outbound network connectivity enabled by default. Setting the following environment variables will provide access to the proxy host. This is necessary, for example, to clone remote git repos.

        # proxy settings\nexport HTTP_PROXY=\"http://proxy.alcf.anl.gov:3128\"\nexport HTTPS_PROXY=\"http://proxy.alcf.anl.gov:3128\"\nexport http_proxy=\"http://proxy.alcf.anl.gov:3128\"\nexport https_proxy=\"http://proxy.alcf.anl.gov:3128\"\n
        "},{"location":"aurora/getting-started-on-aurora/#ssh-to-other-machines","title":"SSH to other machines","text":"

        To ssh to another machine from an Aurora login node, it can be helpful to add a proxyjump through Bastion in your .ssh/config file. The first password prompt would be for bastion, followed by a prompt for the remote machine.

        $ cat .ssh/config\nHost my.awesome.machine.edu\n    ProxyJump bastion.alcf.anl.gov\n\n$ ssh me@my.awesome.machine.edu\n

        Additional guidance on scp and transfering files to Aurora is available here.

        "},{"location":"aurora/getting-started-on-aurora/#hardware-overview","title":"Hardware Overview","text":"

        An overview of the Aurora system including details on the compute node architecture is available on the Machine Overview page.

        "},{"location":"aurora/getting-started-on-aurora/#file-systems-and-daos","title":"File Systems and DAOS","text":""},{"location":"aurora/getting-started-on-aurora/#home-and-project-directories","title":"Home and Project Directories","text":"

        Home directories on Aurora are /home/username, available on login and compute nodes. This is provided from /lus/gecko/home. The default quota is 50 GB. Note that bastions have a different /home and the default quota is 500 MB.

        Lustre project directories are under /lus/gecko/projects. ALCF staff should use /lus/gila/projects/Aurora_deployment project directory. ESP and ECP project members should use their corresponding project directories. The project name is similar to the name on Theta/Polaris with an _CNDA suffix (e.g.: projectA_aesp_CNDA, CSC250ADABC_CNDA). Default quota is 1 TB. The project PI should email support@alcf.anl.gov if their project requires additional storage.

        Gecko is a small Lustre system for early Aurora use on login and compute nodes. Eventually, production file systems Eagle and Grand will be mounted on Aurora login nodes.

        "},{"location":"aurora/getting-started-on-aurora/#daos","title":"DAOS","text":"

        The primary storage system on Aurora is not a file system, but rather an object store called the Distributed Asynchronous Object Store. This is a key-array based system embedded directly in the Slingshot fabric, which provides much faster I/O than conventional block-based parallel file systems such as Lustre (even those using non-spinning disk and/or burst buffers). Project PIs will have requested a storage pool on DAOS via INCITE/ALCC/DD allocation proposals.

        Preproduction ESP and ECP Aurora project PIs should email support@alcf.anl.gov to request DAOS storage with the following information

        • Project name (e.g. FOO_aesp_CNDA)
        • Storage capacity (For ESP projects, if this is different than in the ESP proposal, please give brief justification)

        See DAOS Overview for more on using DAOS for I/O.

        "},{"location":"aurora/getting-started-on-aurora/#compiling-applications","title":"Compiling Applications","text":"

        Users are encouraged to read through the Compiling and Linking Overview page and corresponding pages depending on the target compiler and programming model.

        Autotools and cmake are available by loading the following module files.

        $ module use /soft/modulefiles\n$ module load spack-pe-gcc autoconf cmake\n
        "},{"location":"aurora/getting-started-on-aurora/#python-on-aurora","title":"Python on Aurora","text":"

        Frameworks on Aurora can be loaded into a users environment by loading the frameworks module as follows. The conda environment loaded with this module makes available TensorFlow, Horovod, and Pytorch with Intel extensions and optimizations.

        module use /soft/modulefiles\nmodule load frameworks\n
        "},{"location":"aurora/getting-started-on-aurora/#additional-software","title":"Additional Software","text":"

        A variety of additional tools and software libraries are provided in the Spack PE. For example, a user can load tmux through the spack-pe-gcc module:

        module use /soft/modulefiles\nmodule load spack-pe-gcc\nmodule load tmux\n
        "},{"location":"aurora/getting-started-on-aurora/#submitting-and-running-jobs","title":"Submitting and Running Jobs","text":"

        Aurora uses the PBSPro job scheduler system. For Aurora-specific job documentation, refer to Running Jobs on Aurora

        "},{"location":"aurora/getting-started-on-aurora/#getting-assistance","title":"Getting Assistance","text":"

        Please direct all questions, requests, and feedback to support@alcf.anl.gov.

        "},{"location":"aurora/known-issues/","title":"Known Issues","text":"

        This is a collection of known issues that have been encountered during Aurora's early user phase. Documentation will be updated as issues are resolved. Users are encouraged to email support@alcf.anl.gov to report issues.

        "},{"location":"aurora/known-issues/#running-applications","title":"Running Applications","text":"
        1. Cassini Event Queue overflow detected. errors may occur for certain MPI communications and may happen for a variety of reasons - software and hardware, job placement, job routing, and the sate of the machine. Simply speaking, it means one of the network interfaces is getting messages too fast and cannot keep up to process them
        libfabric:16642:1701636928::cxi:core:cxip_cq_eq_progress():531<warn> x4204c1s3b0n0: Cassini Event Queue overflow detected.\n

        As a workaround, the following environment variables can be set to try alleviating the problem.

        export FI_CXI_DEFAULT_CQ_SIZE=131072\nexport FI_CXI_OVFLOW_BUF_SIZE=8388608\nexport FI_CXI_CQ_FILL_PERCENT=20\n

        The value of FI_CXI_DEFAULT_CQ_SIZE can be set to something larger if issues persist. This is directly impacted by the number of unexpected messages sent and so may need to be increased as the scale of the job increases.

        "},{"location":"aurora/known-issues/#submitting-jobs","title":"Submitting Jobs","text":"

        Jobs may fail to successfully start at times (particularly at higher node counts). If no error message is apparent, then one thing to check is the comment field in the full job information for the job using the command qstat -xf [JOBID].

        1. In the event that you find your job placed on hold, you may find the message comment = job held, too many failed attempts to run. This does not indicate a problem with your script, but indicates PBS made several attempts to find a set of nodes to run your job and was not able too. Users are encouraged to delete the held job and try resubmitting.

        2. In the event of a node going down during a job, users may encounter messages such as ping failed on x4616c0s4b0n0: Application 047a3c9f-fb41-4595-a2ad-4a4d0ec1b6c1 not found. The node will likely have started a reboot and won't be included in jobs again until checks pass.

        "},{"location":"aurora/running-jobs-aurora/","title":"Running Jobs on Aurora","text":""},{"location":"aurora/running-jobs-aurora/#queues","title":"Queues","text":"

        There is a single routing queue in place called EarlyAppAccess which currently has a node count of 2,844, but we recommend a max job size of 2048 or 2560. This will be replaced by new queues during an upcoming PM.

        For example, a one-node interactive job can be requested for 30 minutes with the following command, where [your_ProjectName] is replaced with an appropriate project name.

        qsub -l select=1 -l walltime=30:00 -A [your_ProjectName] -q EarlyAppAccess -I\n

        Recommended PBSPro options follow.

        #!/bin/sh\n#PBS -A [your_ProjectName]\n#PBS -N\n#PBS -l walltime=[requested_walltime_value]\n#PBS -k doe\n#PBS -l place=scatter\n#PBS -q EarlyAppAccess\n
        "},{"location":"aurora/running-jobs-aurora/#working-around-node-failures","title":"Working Around Node Failures","text":"

        As Aurora is still a pre-production supercomputer, node failures are a fact of life. If you would like to increase the chances that a large job does not terminate due to a node failure, you may choose to interactively route your MPI job around nodes that fail during your run. To do this, you must run interactively and use must manually adjust your run on the fly to remove nodes that have been marked as failed.

        We recommend against useing -W tolerate_node_failures=all in your qsub command, but we acknowledge its use can be helpful. However, you MUST MANUALLY VERIFY your job and remove faulted nodes from your mpiexec command YOURSELF!

        1. Start your interactive job
        2. When the job transitions to Running state, run pbsnodes -l | grep <jobid>
        3. Manually REMOVE all nodes identified in that output from inclusion in your mpiexec

          $ cat $PBS_NODEFILE > local.hostfile\n# edit local.hostfile to remove problem nodes\n$ mpiexec --hostfile local.hostfile [other mpiexec arguments]\n
        4. Continue to execute

        5. If other nodes go down during your job, it will not be killed, and you can further exclude those nodes from your mpiexec as needed

        It is important to note that all nodes marked as faulty by PBS will not be used in subsequent jobs. This mechanism only provides you with a means to execute additional mpiexec commands under the same interactive job after manually removing nodes identified as faulty. Once your PBS job has exited, those faulty nodes will remain offline until further intervention by Aurora staff.

        "},{"location":"aurora/running-jobs-aurora/#running-mpiopenmp-applications","title":"Running MPI+OpenMP Applications","text":"

        Once a submitted job is running calculations can be launched on the compute nodes using mpiexec to start an MPI application. Documentation is accessible via man mpiexec and some helpful options follow.

        • -n total number of MPI ranks
        • -ppn number of MPI ranks per node
        • --cpu-bind CPU binding for application
        • --depth number of cpus per rank (useful with --cpu-bind)
        • --env set environment variables (--env OMP_NUM_THREADS=2)
        • --hostfile indicate file with hostnames (the default is --hostfile $PBS_NODEFILE)

        A sample submission script with directives is below for a 4-node job with 28 MPI ranks on each node and 4 OpenMP threads per rank (1 per CPU core).

        #!/bin/bash -l\n#PBS -N AFFINITY\n#PBS -l select=4\n#PBS -l place=scatter\n#PBS -l walltime=0:10:00\n#PBS -q workq\n#PBS -A MYPROJECT\n\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS=28 # Number of MPI ranks to spawn per node\nNDEPTH=4 # Number of hardware threads per rank (i.e. spacing between MPI ranks)\nNTHREADS=4 # Number of software threads per rank to launch (i.e. OMP_NUM_THREADS)\n\nNTOTRANKS=$(( NNODES * NRANKS ))\n\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS} THREADS_PER_RANK= ${NTHREADS}\"\n\ncd /home/knight/affinity\nmpiexec -n ${NTOTRANKS} -ppn ${NRANKS} --depth=${NDEPTH} --cpu-bind depth -env OMP_NUM_THREADS=${NTHREADS} --env OMP_PLACES=cores ./hello_affinity\n
        "},{"location":"aurora/running-jobs-aurora/#running-gpu-enabled-applications","title":"Running GPU-enabled Applications","text":"

        GPU-enabled applications will similarly run on the compute nodes using the above example script. - The environment variable MPICH_GPU_SUPPORT_ENABLED=1 needs to be set if your application requires MPI-GPU support whereby the MPI library sends and receives data directly from GPU buffers. - If running on a specific GPU or subset of GPUs and/or tiles is desired, then the ZE_AFFINITY_MASK environment variable can be used. For example, if one only wanted an application to access the first two GPUs on a node, then setting ZE_AFFINITY_MASK=0,1 could be used.

        "},{"location":"aurora/running-jobs-aurora/#mpi-rank-and-thread-binding-to-cores-and-gpus","title":"MPI rank and thread binding to cores and GPUs","text":"

        Each node on Aurora has 2 sockets, each with 2 CPUs and 3 PVC GPUs. Each CPU has 52 physical cores, with 2 logical processors (provided by Intel hyper threading) per physical core, for a total of 104 physical cores and 208 logical processors on the CPUs per Aurora node. Each GPU has two tiles on it, for a total of 6 GPUs and 12 GPU tiles on the GPUs per Aurora node. When a parallel job is run, the job must have some way of mapping MPI ranks or threads to each of the 208 logical processors and 6 GPUs or 12 GPU tiles. Mapping is typically done by an affinity mask, which assigns hardware resources to each MPI rank or thread to use.

        A visual representation of node in Aurora is shown below. Each socket is represented by a large blue bubble. Inside, each CPU is represented by a red bubble. Inside of CPU, the white boxes represent the physical cores, and the two grey squares in each tile represent the two logical processors. Each GPU is represented by a large white box, with two grey boxes inside to represent the two tiles.

        Simplified representation of Aurora node

        For the two CPUs, the numbers inside the boxes identify the specific logical processors in the core. That is, logical processor 0 and 104 are the 2 logical processors on the first physical core. Logical processors 1 and 105 are the 2 logical processors that share the second physical core. Since there are 208 logical processors, the numbers run from 0 to 207. For i from 0 to 51, logical processors i and i+104 share a physical core.

        For the six GPUs, the GPU number identifies the GPU, and the tile numbers identify the tile in the GPU, with tiles from 0 to 5 with each GPU have two tiles each $gpu.0 and $gpu.1.

        "},{"location":"aurora/running-jobs-aurora/#binding-mpi-ranks-and-threads-to-cores","title":"Binding MPI ranks and threads to cores","text":"

        Using the \u2013cpu-bind argument to mpiexec, MPI ranks and threads can be assigned to run on specific logical processors on the CPUs. For more information about the flags to mpiexec, see Running MPI+OpenMP-Applications. Four examples of using mpiexec are given below to show how the cpu-bind=depth, cpu-bind=list, --depth arguments affect where MPI ranks and OpenMP threads are mapped.

        "},{"location":"aurora/running-jobs-aurora/#example-1-2-nodes-4-ranksnode-1-threadrank","title":"Example 1: 2 nodes, 4 ranks/node, 1 thread/rank","text":"
        mpiexec -n 8 -ppn 4 --depth 1 --cpu-bind=depth <app> <app_args>\n
        • The \"-n 8\" argument says to use 8 MPI ranks in total and \"-ppn 4\" places 4 ranks per node.
        • The \"--depth 1\" argument says to use 1 logical processor for each MPI rank.
        • The \"--cpu-bind depth\" argument says to spread out the ranks in a round robin manner across the logical processors, first putting one rank on the first logical processor of one physical core, and then looping back to put a second one on the second logical processor. This is done such that there's N logical processors for each MPI rank, where N is the value from the --depth argument (so it's 1 in this case).

        This is the same as

        mpiexec -n 8 -ppn 4 --cpu-bind=list:0:1:2:3 <app> <app_args>\n
        • The \"--cpu-bind list\" argument explicitly lists which logical processor to bind to per node. Each MPI rank is bound to the logical processors that are listed between \":\". So here, rank 0 to logical processor 0, rank 1 to logical processor 1, etc.
        "},{"location":"aurora/running-jobs-aurora/#resulting-mapping","title":"Resulting mapping","text":"

        MPI ranks 0,1,2,3,4,5,6,7 map to logical processors 0,1,2,3 on each of the two nodes. Assuming the job was allocated on node 0 and node 1:

        • MPI rank 0 \u2192 node 0, logical processor 0

        • MPI rank 1 \u2192 node 0, logical processor 1

        • MPI rank 2 \u2192 node 0, logical processor 2

        • MPI rank 3 \u2192 node 0, logical processor 3

        • MPI rank 4 \u2192 node 1, logical processor 0

        • MPI rank 5 \u2192 node 1, logical processor 1

        • MPI rank 6 \u2192 node 1, logical processor 2

        • MPI rank 7 \u2192 node 1, logical processor 3

        The figure below shows the mapping, where the different colors are different MPI ranks.

        Example 1 Mapping"},{"location":"aurora/running-jobs-aurora/#example-2-2-nodes-2-ranksnode-2-threadrank","title":"Example 2: 2 nodes, 2 ranks/node, 2 thread/rank","text":"
        OMP_PLACES=threads OMP_NUM_THREADS=2 mpiexec -n 4 -ppn 2 --depth 2 --cpu-bind=depth <app> <app_args>\n
        • The \"-n 4\" argument says to use 4 MPI ranks in total and \"-ppn 2\" places 2 ranks per node.
        • The \"--depth 2\" argument says to use 2 logical processor for each MPI rank.
        • The \"--cpu-bind depth\" argument says to spread out the ranks in a round robin manner across the logical processors, first putting one rank on the first logical processor of one physical core, and then looping back to put a second one on the second logical processor. This is done such that there's N logical processors for each MPI rank, where N is the value from the --depth argument (so it's 2 in this case).
        • OMP_NUM_THREADS=2 launches two threads per MPI rank
        • OMP_PLACES=threads says to bind the OpenMP threads to logical processors

        This is the same as

        OMP_PLACES=threads OMP_NUM_THREADS=2 mpiexec -n 4 -ppn 2 --cpu-bind=list:0,1:2,3 <app> <app_args>\n
        • The \"--cpu-bind list\" argument explicitly lists which logical processor to bind to. Each MPI rank is bound to the logical processors that are listed between \":\". Between \":\", the logical processors to bind to are listed in a comma-separated manner. So here, rank 0 is bound to logical processors 0 and 1, rank 2 to logical processors 2 and 3. OMP_PLACES=threads then binds the specific threads to the logical processors in the list.
        "},{"location":"aurora/running-jobs-aurora/#resulting-mapping_1","title":"Resulting mapping","text":"

        Assuming the job was allocated on node 0 and node 1:

        • MPI rank 0, OpenMP thread 0 \u2192 node 0, logical processor 0

        • MPI rank 0, OpenMP thread 1 \u2192 node 0, logical processor 1

        • MPI rank 1, OpenMP thread 0 \u2192 node 0, logical processor 2

        • MPI rank 1, OpenMP thread 1 \u2192 node 0, logical processor 3

        • MPI rank 2, OpenMP thread 0 \u2192 node 1, logical processor 0

        • MPI rank 2, OpenMP thread 1 \u2192 node 1, logical processor 1

        • MPI rank 3, OpenMP thread 0 \u2192 node 1, logical processor 2

        • MPI rank 3, OpenMP thread 1 \u2192 node 1, logical processor 3

        The figure below shows the mapping, where the different colors are different MPI ranks.

        Example 2 Mapping"},{"location":"aurora/running-jobs-aurora/#example-3-2-nodes-2-ranksnode-1-threadrank-compact-fashion","title":"Example 3: 2 nodes, 2 ranks/node, 1 thread/rank, compact fashion","text":"
        mpiexec -n 4 -ppn 2 --cpu-bind=list:0:104 <app> <app_args>\n
        • The \"--cpu-bind list\" argument explicitly lists which logical processor to bind to per node. Each MPI rank is bound to the logical processors that are listed between \":\". So here, rank 0 to logical processor 0, rank 1 to logical processor 104, which share the same physical core.
        "},{"location":"aurora/running-jobs-aurora/#resulting-mapping_2","title":"Resulting mapping","text":"

        Assuming the job was allocated on node 0 and node 1:

        • MPI rank 0 \u2192 node 0, logical processor 0

        • MPI rank 1 \u2192 node 0, logical processor 104

        • MPI rank 2 \u2192 node 1, logical processor 0

        • MPI rank 3 \u2192 node 1, logical processor 104

        The figure below shows the mapping, where the different colors are different MPI ranks.

        Example 3 Mapping"},{"location":"aurora/running-jobs-aurora/#example-4-1-node-12-ranksnode","title":"Example 4: 1 node, 12 ranks/node","text":"

        This setup is a common case for applications: 12 ranks/node, where each rank will offload to one of the 12 GPU tiles. Note that explicit list binding is needed here to avoid binding a MPI rank to a logical processor on different socket than the GPU it might be targetting (as would happen if cpu_bind=depth was used).

        mpiexec -n 12 -ppn 12 --cpu-bind=list:0-7:8-15:16-23:24-31:32-39:40-47:52-59:60-67:68-75:76-83:84-91:92-99 <app> <app_args>\n
        • The \"--cpu-bind list\" argument explicitly lists which logical processor to bind to per node. Each MPI rank is bound to the logical processors that are listed between \":\". So here, rank 0 to logical processors 0-7, rank 1 to logical processors 8-15, etc.
        "},{"location":"aurora/running-jobs-aurora/#resulting-mapping_3","title":"Resulting mapping","text":"

        Assuming the job was allocated on node 0 and node 1, the mapping looks like:

        • MPI rank 0 \u2192 node 0, socket 0, logical processors 0-7

        • MPI rank 1 \u2192 node 0, socket 0, logical processor 8-15

        • MPI rank 2 \u2192 node 0, socket 0, logical processor 16-23

        • MPI rank 3 \u2192 node 0, socket 0, logical processor 24-31

        • MPI rank 4 \u2192 node 0, socket 0, logical processor 32-39

        • MPI rank 5 \u2192 node 0, socket 0, logical processor 40-47

        • MPI rank 6 \u2192 node 0, socket 1, logical processor 52-59

        • MPI rank 7 \u2192 node 0, socket 1, logical processor 60-67

        • MPI rank 8 \u2192 node 0, socket 1, logical processor 68-75

        • MPI rank 9 \u2192 node 0, socket 1, logical processor 76-83

        • MPI rank 10 \u2192 node 0, socket 1, logical processor 84-91

        • MPI rank 11 \u2192 node 0, socket 1, logical processor 92-99

        The important point here is that with explicit binding, we were able to ensure socket 0 had 6 ranks and socket 1 has 6 ranks. Note how MPI rank 5 ends at logical processor 47, but MPI rank 6 begins with logical processor 52, so this involves leaving several cores empty. However, it allows the cores to be spread evenly across the two sockets.

        If instead we used \"--depth\" as so:

        mpiexec -n 12 -ppn 12 --depth 8 --cpu-bind=depth <app> <app_args>\n
        then the mapping is:

        • MPI rank 0 \u2192 node 0, socket 0, logical processors 0-7

        • MPI rank 1 \u2192 node 0, socket 0, logical processor 8-15

        • MPI rank 2 \u2192 node 0, socket 0, logical processor 16-23

        • MPI rank 3 \u2192 node 0, socket 0, logical processor 24-31

        • MPI rank 4 \u2192 node 0, socket 0, logical processor 32-39

        • MPI rank 5 \u2192 node 0, socket 0, logical processor 40-47

        • MPI rank 6 \u2192 node 0, socket 0 and socket 1, logical processor 48-55

        • MPI rank 7 \u2192 node 0, socket 1, logical processor 56-63

        • MPI rank 8 \u2192 node 0, socket 1, logical processor 64-71

        • MPI rank 9 \u2192 node 0, socket 1, logical processor 72-79

        • MPI rank 10 \u2192 node 0, socket 1, logical processor 80-87

        • MPI rank 11 \u2192 node 0, socket 1, logical processor 88-95

        Note that the threads MPI rank 6 are bound to cross both socket 0 and socket 1, which potentially will lead to worse performance than using cpu-bind=list to explicitly spread out the ranks and avoid splitting one over two sockets.

        "},{"location":"aurora/running-jobs-aurora/#binding-mpi-ranks-to-gpus","title":"Binding MPI ranks to GPUs","text":"

        Support in MPICH on Aurora to bind MPI ranks to GPUs is currently work-in-progress. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set ZE_AFFINITY_MASK for each MPI rank. Users are encouraged to use the /soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh script for instances where each MPI rank is to be bound to a single GPU tile with a round-robin assignment.

        This script can be placed just before the executable in an mpiexec command like so.

        mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth /soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh ./hello_affinity\n

        A simple version of this script is below to illustrate how ZE_AFFINITY_MASK is uniquely set for each MPI rank.

        #!/bin/bash -l\nnum_gpu=6\nnum_tile=2\ngpu_id=$(( (PALS_LOCAL_RANKID / num_tile ) % num_gpu ))\ntile_id=$((PALS_LOCAL_RANKID % num_tile))\nunset EnableWalkerPartition\nexport ZE_ENABLE_PCI_ID_DEVICE_ORDER=1\nexport ZE_AFFINITY_MASK=$gpu_id.$tile_id\n#echo \u201cRANK= ${PMI_RANK} LOCAL_RANK= ${PMI_LOCAL_RANK} gpu= ${gpu}\u201d\nexec \"$@\"\n

        Users with different MPI-GPU affinity needs, such as assigning multiple GPUs/tiles per MPI rank, are encouraged to modify a local copy of /soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh to suit their needs.

        One example below shows a common mapping of MPI ranks to cores and GPUs.

        "},{"location":"aurora/running-jobs-aurora/#example-1-1-node-12-ranksnode-1-threadrank-1-rankgpu","title":"Example 1: 1 node, 12 ranks/node, 1 thread/rank, 1 rank/GPU","text":"
        mpiexec -n 12 -ppn 12 --cpu-bind=list:0-7:8-15:16-23:24-31:32-39:40-47:52-59:60-67:68-75:76-83:84-91:92-99 /soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh <app> <app_args>\n
        • The \"-n 12\" argument says to use 12 MPI ranks in total and \"-ppn 12\" places 12 ranks per node.
        • The \"--cpu-bind list\" argument gives the mapping of MPI ranks to cores, as described in Binding MPI ranks and threads to cores.
        • The /soft/tools/mpi_wrapper_utils/gpu_tile_compact.sh wrapper sets ZE_AFFINITY_MASK for each of the 12 ranks such that rank 0 maps to GPU 0, Tile 0, rank 1 maps to GPU 0, Tile 1, rank 2 naps to GPU 1, Tile 0 etc. in a round-robin compact fashion.
        "},{"location":"aurora/running-jobs-aurora/#resulting-mapping_4","title":"Resulting mapping","text":"

        This is one of the most common cases, with 1 MPI rank targeting each GPU tile. Assuming the job was allocated on node 0 and node 1, the mapping looks like:

        • MPI rank 0 \u2192 node 0, socket 0, logical processors 0-7, GPU 0, Tile 0

        • MPI rank 1 \u2192 node 0, socket 0, logical processor 8-15, GPU 0, Tile 1

        • MPI rank 2 \u2192 node 0, socket 0, logical processor 16-23, GPU 1, Tile 0

        • MPI rank 3 \u2192 node 0, socket 0, logical processor 24-31, GPU 1, Tile 1

        • MPI rank 4 \u2192 node 0, socket 0, logical processor 32-39, GPU 2, Tile 0

        • MPI rank 5 \u2192 node 0, socket 0, logical processor 40-47, GPU 2, Tile 1

        • MPI rank 6 \u2192 node 0, socket 1, logical processor 52-59, GPU 3, Tile 0

        • MPI rank 7 \u2192 node 0, socket 1, logical processor 60-67, GPU 3, Tile 1

        • MPI rank 8 \u2192 node 0, socket 1, logical processor 68-75, GPU 4, Tile 0

        • MPI rank 9 \u2192 node 0, socket 1, logical processor 76-83, GPU 4, Tile 1

        • MPI rank 10 \u2192 node 0, socket 1, logical processor 84-91, GPU 5, Tile 0

        • MPI rank 11 \u2192 node 0, socket 1, logical processor 92-99, GPU 5, Tile 1

        "},{"location":"aurora/running-jobs-aurora/#interactive-jobs-on-compute-nodes","title":"Interactive Jobs on Compute Nodes","text":"

        Here is how to submit an interactive job to, for example, edit/build/test an application on Aurora compute nodes:

        qsub -I -l select=1,walltime=1:00:00,place=scatter -A MYPROJECT -q workq\n

        This command requests 1 node for a period of 1 hour in the workq queue. After waiting in the queue for a node to become available, a shell prompt on a compute node will appear. You may then start building applications and testing gpu affinity scripts on the compute node.

        NOTE: If you want to ssh or scp to one of your assigned compute nodes you will need to make sure your $HOME directory and your $HOME/.ssh directory permissions are both set to 700.

        "},{"location":"aurora/running-jobs-aurora/#running-multiple-mpi-applications-on-a-node","title":"Running Multiple MPI Applications on a node","text":"

        Multiple applications can be run simultaneously on a node by launching several mpiexec commands and backgrounding them. For performance, it will likely be necessary to ensure that each application runs on a distinct set of CPU resources and/or targets specific GPUs and tiles. One can provide a list of CPUs using the --cpu-bind option, which when combined with ZE_AFFINITY_MASK provides a user with specifying exactly which CPU and GPU resources to run each application on. In the simple example below, twelve instances of the application are simultaneously running on a single node. In the first instance, the application is spawning MPI ranks 0-3 on CPU cores 0-3 and using GPU 0 tile 0.

        export OMP_NUM_THREADS=1\nexport ZE_ENABLE_PCI_ID_DEVICE_ORDER=1\n\nexport ZE_AFFINITY_MASK=0.0\nmpiexec --np 4 --ppn 4 --cpu-bind list:0:1:2:3 ./hello_affinity &\n\nexport ZE_AFFINITY_MASK=0.1\nmpiexec -n 4 --ppn 4 --cpu-bind list:4:5:6:7 ./hello_affinity &\n\nexport ZE_AFFINITY_MASK=1.0\nmpiexec -n 4 --ppn 4 --cpu-bind list:8:9:10:11 ./hello_affinity &\n\n\nexport ZE_AFFINITY_MASK=5.1\nmpiexec -n 4 --ppn 4 --cpu-bind list:40:41:42:43 ./hello_affinity &    \n\nwait\n

        Users will likely find it beneficial to launch processes across CPU cores in both sockets of a node.

        "},{"location":"aurora/running-jobs-aurora/#compute-node-access-to-the-internet","title":"Compute Node Access to the Internet","text":"

        Currently, the only access to the internet is via a proxy. Here are the proxy environment variables for Aurora:

        export http_proxy=\"http://proxy.alcf.anl.gov:3128\"\nexport https_proxy=\"http://proxy.alcf.anl.gov:3128\"\nexport ftp_proxy=\"http://proxy.alcf.anl.gov:3128\"\n

        In the future, though we don't have a timeline on this because it depends on future features in slingshot and internal software development, we intend to have public IP addresses be a schedulable resource. For instance, if only your head node needed public access your select statement might looks something like: -l select=1:pubnet=True+63.

        "},{"location":"aurora/running-jobs-aurora/#controlling-where-your-job-runs","title":"Controlling Where Your Job Runs","text":"

        If you wish to have your job run on specific nodes form your select like this: -l select=1:vnode=<node name1>+1:vnode=<node name2>... . Obviously, that gets tedious for large jobs.

        If you want to control the location of a few nodes, for example 2 out of 64, but the rest don't matter, you can do something like this: -l select=1:vnode=<node name1>+1:vnode=<node name2>+62:system=foo.

        "},{"location":"aurora/running-jobs-aurora/#network-rack-and-dragonfly-group-mappings","title":"Network: Rack and Dragonfly Group Mappings","text":"

        Content coming soon.

        "},{"location":"aurora/sunspot-to-aurora/","title":"Transitioning from Sunspot to Aurora","text":"

        Sunspot is two racks (128 nodes) of the same hardware as Aurora that teams were initially given access to for testing. Virtually all hardware and software aspects of Sunspot are identical to Aurora, so the transition to Aurora should largely be staightforward in terms of compiling applications and submitting batch jobs.

        The primary documentation source for Sunspot for pre-production users (ECP and ESP) is Getting Started on Sunspot.

        "},{"location":"aurora/sunspot-to-aurora/#transferring-files-to-aurora","title":"Transferring files to Aurora","text":"

        One key immediate difference that users will need to account for is that Sunspot (gila) and Aurora (gecko) mount their own independent filesystems and DAOS systems. This means that Sunspot users will need to manually transfer files and/or DAOS containers to Aurora's filesystem if they desire to retain anything they've created on Sunspot.

        With the bastion pass-through nodes currently used to access both Sunspot and Aurora, users will find it helpful to modify their .ssh/config files appropriately to facilitate transfers to Aurora. These changes are similar to what Sunspot users may have already implemented and a simple example follows.

        $ cat .ssh/config\nknight@aurora-uan-0009:~> cat .ssh/config \n\nHost bastion.alcf.anl.gov\n     User knight\n\nHost *.sunspot.alcf.anl.gov sunspot.alcf.anl.gov\n     ProxyJump bastion.alcf.anl.gov\n     DynamicForward 3142\n     user knight\n\nHost polaris.alcf.anl.gov\n     ProxyJump bastion.alcf.anl.gov\n     DynamicForward 3142\n     user knight\n

        From an Aurora login-node, this readily enables one to transfer files from Sunspot's gila filesystem or one of the production filesystems at ALCF (home, grand, and eagle). With the use of ProxyJump here, entering the MobilePass+ or Cryptocard passcode twice will be needed (once for bastion and once for the other resource).

        This simple example transfers a file from Sunspot.

        knight@aurora-uan-0009:~> scp knight@sunspot.alcf.anl.gov:/lus/gila/projects/Aurora_deployment/knight/test.txt ./\n---------------------------------------------------------------------------\n                            Notice to Users\n...\n[Password:\n---------------------------------------------------------------------------\n                            Notice to Users\n... \n[Password:\nknight@aurora-uan-0009:~> cat test.txt \nfrom_sunspot gila\n

        This simple example transfers a file from the grand filesystem via Polaris.

        knight@aurora-uan-0009:~> scp knight@polaris.alcf.anl.gov:/grand/catalyst/proj-shared/knight/test.txt ./\n---------------------------------------------------------------------------\n                            Notice to Users\n...\n[Password:\n---------------------------------------------------------------------------\n                            Notice to Users\n... \n[Password:\nknight@aurora-uan-0009:~> cat test.txt \nfrom_polaris grand\n
        "},{"location":"aurora/sunspot-to-aurora/#default-software-environment","title":"Default software environment","text":"

        Users should be mindful of differences in the default enviroment on Aurora compared to Sunspot. This is especially important if applications require specific versions of software. As an example, the default oneapi modules on Sunspot is oneapi/eng-compiler/2023.05.15.006 compared to oneapi/eng-compiler/2022.12.30.003 on Aurora.

        knight@aurora-uan-0009:~> module list\n\nCurrently Loaded Modules:\n  1) gcc/11.2.0                    3) intel_compute_runtime/release/agama-devel-551   5) libfabric/1.15.2.0   7) cray-libpals/1.2.12\n  2) mpich/51.2/icc-all-pmix-gpu   4) oneapi/eng-compiler/2022.12.30.003              6) cray-pals/1.2.12\n

        Users on Aurora can access additional versions of software on the login node and in job scripts by appending /soft/modulefiles to their MODULEPATH as in the following example.

        knight@aurora-uan-0009:~> module use /soft/modulefiles\nknight@aurora-uan-0009:~> module avail oneapi\n\n------------------------------------------------------------------------- /soft/modulefiles -------------------------------------------------------------------------\n   oneapi/eng-compiler/2023.05.15.003    oneapi/eng-compiler/2023.05.15.007        oneapi/release/2023.05.15.001        spack-pe-oneapi/0.4-rc1 (D)\n   oneapi/eng-compiler/2023.05.15.006    oneapi/eng-compiler/2023.10.15.002 (D)    oneapi/release/2023.10.15.001 (D)    spack-pe-oneapi/0.5-rc1\n\n---------------------------------------------------------- /opt/aurora/23.073.0/oneapi/release/modulefiles ----------------------------------------------------------\n   oneapi/release/2022.12.30.001\n\n----------------------------------------------------- /opt/aurora/23.073.0/CNDA/oneapi/eng-compiler/modulefiles -----------------------------------------------------\n   oneapi/eng-compiler/2022.12.30.003 (L)\n

        Users can then load the desired modules to best match the environment used in work on Sunspot.

        knight@aurora-uan-0009:~> module swap oneapi/eng-compiler/2022.12.30.003 oneapi/eng-compiler/2023.05.15.006\n\nThe following have been reloaded with a version change:\n  1) intel_compute_runtime/release/agama-devel-551 => intel_compute_runtime/release/agama-devel-647\n  2) oneapi/eng-compiler/2022.12.30.003 => oneapi/eng-compiler/2023.05.15.006\n
        "},{"location":"aurora/applications-and-libraries/applications/gromacs/","title":"GROMACS","text":"

        Placeholder

        "},{"location":"aurora/applications-and-libraries/applications/lammps/","title":"LAMPPS","text":""},{"location":"aurora/applications-and-libraries/applications/namd/","title":"NAMD","text":""},{"location":"aurora/applications-and-libraries/applications/nwchemex/","title":"NWChemEx","text":"

        Placeholder

        "},{"location":"aurora/applications-and-libraries/applications/qmcpack/","title":"QMCPack","text":"

        Placeholder

        "},{"location":"aurora/applications-and-libraries/applications/quantum-espressp/","title":"Quantum ESPRESSO","text":"

        Placeholder

        "},{"location":"aurora/applications-and-libraries/libraries/cabana-aurora/","title":"Cabana","text":""},{"location":"aurora/applications-and-libraries/libraries/cabana-aurora/#cabana_1","title":"Cabana","text":"

        Cabana is built atop Kokkos. It provides class templates useful for implementing particle codes

        "},{"location":"aurora/applications-and-libraries/libraries/cabana-aurora/#cabana-documentation","title":"Cabana Documentation","text":"
        • Cabana Wiki
        • Cabana github
        "},{"location":"aurora/applications-and-libraries/libraries/cabana-aurora/#cabana-on-aurora","title":"Cabana on Aurora","text":"

        Built against the prebuilt Kokkos on Aurora, the prebuilt Cabana includes 3 backends: Serial and OpenMP for CPU execution and SYCL for GPU execution. To use it, run

        module use /soft/modulefiles\nmodule load cabana\n

        Currently, Cabana is a headers-only installation; there are no libraries per se.

        "},{"location":"aurora/applications-and-libraries/libraries/math-libraries/","title":"Math Libraries","text":""},{"location":"aurora/applications-and-libraries/libraries/mkl/","title":"MKL","text":""},{"location":"aurora/applications-and-libraries/libraries/mpi/","title":"Aurora MPICH","text":"

        Placeholder

        "},{"location":"aurora/applications-and-libraries/libraries/onedal/","title":"oneDAL","text":""},{"location":"aurora/applications-and-libraries/libraries/spack-pe/","title":"Spack PE","text":"

        The Spack PE is a software stack which provides various build tools, utilities, and libraries. The Spack PE consists of two parts: spack-pe-gcc and spack-pe-oneapi. spack-pe-gcc contains commonly used software packages compiled with GCC. spack-pe-oneapi is based on the E4S Project and provides performant HPC libraries built with the OneAPI PE. spack-pe-oneapi is dependent on both spack-pe-gcc and the OneAPI PE.

        "},{"location":"aurora/applications-and-libraries/libraries/spack-pe/#using-software-from-the-spack-pe","title":"Using software from the Spack PE","text":"

        Currently the Spack PE is installed in /soft. Packages in the Spack PE can be accessed via modulefile:

        $ module use /soft/modulefiles\n$ module load spack-pe-gcc\n$ module load cmake\n$ which cmake\n/soft/packaging/spack/gcc/0.5-rc1/install/linux-sles15-x86_64/gcc-11.4.0/cmake-3.26.4-jiwocshcdaghfyjb6jzyhj7zyorgkfkh/bin/cmake\n

        The above example loads cmake into the current environment. The spack-pe-gcc module adds paths to the user's MODULEPATH; individual packages are subsequently loaded through the newly available modules. The full list of available packages can be viewed by running module avail. Packages are loaded in the same way from spack-pe-oneapi, with the caveat that spack-pe-oneapi is tied to specific versions of oneapi.

        "},{"location":"aurora/applications-and-libraries/libraries/spack-pe/#inspecting-packages","title":"Inspecting packages","text":"

        When a module within the Spack PE is loaded, several environment variables are updated to integrate the package into the user's environment. Additionally, the PACKAGE_ROOT variable is set to contain the path to the installation prefix of the package. For example, continuing from the cmake example above:

        $ echo $CMAKE_ROOT\n/soft/packaging/spack/gcc/0.5-rc1/install/linux-sles15-x86_64/gcc-11.4.0/cmake-3.26.4-jiwocshcdaghfyjb6jzyhj7zyorgkfkh\n$ ls $CMAKE_ROOT\nbin  doc  share\n

        This variable can be used to inspect software installations and find header or library paths. Additionally, Spack packages have a .spack directory in the installation prefix which contains build logs and information on configure and build options.

        "},{"location":"aurora/applications-and-libraries/libraries/spack-pe/#building-software-with-spack","title":"Building software with Spack","text":"

        Spack is a powerful package manager designed for HPC. The Spack PE is installed and managed with Spack; users can also install Spack in their own home or project directory to manage their software builds. Spack has a steep learning curve, but it may benefit workflows involving frequent builds with complex dependencies.

        For users who wish to use Spack to install their own software, we provide configuration files corresponding to the Spack PE deployments. These configuration files can be found in /soft/packaging/spack/settings, organized into directories by Aurora PE and Spack PE versions. Not all of these settings will be useful for all builds and it is not recommended to adopt these wholesale as global settings. The recommended method is to include these settings ad hoc in a spack environment to control what information spack uses for its builds. However, we highly recommend using or adapting the aurora_packages_{spack-pe-version}.yaml and compilers_{spack-pe-version}.yaml configurations, since these configurations are essential for Spack to recognize the OneAPI compiler and associated Aurora PE components such as MPICH.

        Support requests and feedback for ALCF-specific issues should be directed to support@alcf.anl.gov. For general spack questions, users are encouraged to consult the following resources:

        • Spack development website
        • Spack documentation
        • Spack tutorial
        • Spack Slack channel
        "},{"location":"aurora/build-tools/cmake-aurora/","title":"CMake","text":""},{"location":"aurora/build-tools/cmake-aurora/#cmake_1","title":"CMake","text":"

        CMake is a build configuration system that uses higher-level description files to automatically generate Makefiles.

        "},{"location":"aurora/build-tools/cmake-aurora/#cmake-documentation","title":"CMake Documentation","text":"
        • CMake website
        "},{"location":"aurora/build-tools/cmake-aurora/#cmake-on-aurora","title":"CMake on Aurora","text":"

        To use CMake on Aurora, run

        module use /soft/modulefiles\nmodule load spack-pe-gcc cmake\n
        "},{"location":"aurora/compiling-and-linking/aurora-example-program-makefile/","title":"Aurora Example Program Makefile","text":"

        Several simple examples of building CPU and GPU-enabled codes on Aurora are available in the ALCF GettingStarted repo for supported programming models. If building your application on the login node is problematic for some reason (e.g. absense of a GPU), then users are encouraged to build and test applications directly on one of the Aurora compute nodes via an interactive job. The discussion below makes use of the oneAPI compilers in the default environment as illustrative examples.

        "},{"location":"aurora/compiling-and-linking/aurora-example-program-makefile/#cpu-mpiopenmp-example","title":"CPU MPI+OpenMP Example","text":"

        One of the first useful tasks with any new machine, schedule, and job launcher is to ensure one is binding MPI ranks and OpenMP threads to the host cpu as intended. A simple HelloWorld MPI+OpenMP example is available here to get started with.

        The Aurora compute nodes are dual-socket with 52 physical cores in each socket for a total of 104 cores. As hyperthreading is enabled, each core will show up as two CPUs for a total of 208. In many of the examples below, only a single process is spawned on each physical core.

        The application can be straigthforwardly compiled using the MPICH compiler wrappers in the default environment.

        mpicxx -g -fopenmp -O3 main.cpp\n

        The executable hello_affinity can then be launched in a job script (or directly in shell of an interactive job) using mpiexec as discussed here.

        #!/bin/sh\n#PBS -l select=1\n#PBS -l place=scatter\n#PBS -l walltime=0:15:00\n#PBS -q workq\n#PBS -A Catalyst\n#PBS -l filesystems=home\n\n#cd ${PBS_O_WORKDIR}\n\n# MPI example w/ MPI ranks and OpenMP threads spread evenly across cores (one process per physical core)\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=26\nNDEPTH=4\nNTHREADS=4\n\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} --env OMP_PLACES=cores ./hello_affinity\n
        "},{"location":"aurora/compiling-and-linking/aurora-example-program-makefile/#gpu-openmp-example","title":"GPU OpenMP Example","text":"

        A simple OpenMP offload example is available here. Compilation proceeds similar to the above CPU-only example except for the use of compiler flags to enable GPU offload.

        mpicxx -fiopenmp -fopenmp-targets=spir64 main.cpp\n

        Running the example with 12 MPI ranks and no other settings will generate output like the following.

        $ make\n\n$ mpiexec -n 12 --ppn 12 --depth=1 --cpu-bind depth ./hello_affinity\nNUM_OF_NODES= 1 TOTAL_NUM_RANKS= 12 RANKS_PER_NODE= 12 THREADS_PER_RANK= 1\n\n  Using OPENMP v5.0\n  num_devices=     6\n  Default device=  0\n  Host=            6\n  num_teams=       896\n  num_threads=     1\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 0  list_cores= (0)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 1  list_cores= (1)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 2  list_cores= (2)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 3  list_cores= (3)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 4  list_cores= (4)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 5  list_cores= (5)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 6  list_cores= (6)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 7  list_cores= (7)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 8  list_cores= (8)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 9  list_cores= (9)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 10  list_cores= (10)  num_devices= 6  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 11  list_cores= (11)  num_devices= 6  gpu_id= 0\n

        This simple application does not handle binding of MPI ranks to GPUs, so each of the 12 MPI ranks detects all six GPUs on the node and by default all will select the first GPU listed. The binding of MPI ranks to GPUs can be handled mpiexec in the near future, but for the time being a simple helper script is available for those that need it. There is a centrally installed general gpu_tile_compact.sh script available for use, but the examples include the following example script for convenience in case one would like to explore different cpu-gpu bindings (e.g. bind first N MPI ranks to first GPU).

        $ cat set_affinity_gpu_sunspot.sh \n#!/usr/bin/env bash\n\nnum_gpu=6\nnum_tile=2\n\ngpu_id=$(( (PALS_LOCAL_RANKID / num_tile ) % num_gpu ))\ntile_id=$((PALS_LOCAL_RANKID % num_tile))\n\nunset EnableWalkerPartition\nexport ZE_ENABLE_PCI_ID_DEVICE_ORDER=1\nexport ZE_AFFINITY_MASK=$gpu_id.$tile_id\n\necho \u201cRANK= ${PALS_RANKID} LOCAL_RANK= ${PALS_LOCAL_RANKID} gpu= ${gpu_id}  tile= ${tile_id}\u201d\n\n#https://stackoverflow.com/a/28099707/7674852\n\"$@\"\n
        The ZE_AFFINITY_MASK environment variable sets the devices that will be available to the CPU process and can be a comma-separated list of GPUs and/or GPU tiles. Each Aurora GPU consists of two tiles that can be separately bound to CPU processes. This simple script will set ZE_AFFINITY_MASK for each MPI rank such that GPU tiles on a node are round-robin assigned.

        $ mpiexec -n 12 --ppn 12 --depth=1 --cpu-bind depth ./set_affinity_gpu_sunspot.sh ./hello_affinity\nNUM_OF_NODES= 1 TOTAL_NUM_RANKS= 12 RANKS_PER_NODE= 12 THREADS_PER_RANK= 1\n\n  Using OPENMP v5.0\n  num_devices=     1\n  Default device=  0\n  Host=            1\n  num_teams=       448\n  num_threads=     1\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 0  list_cores= (0)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 1  list_cores= (1)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 2  list_cores= (2)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 3  list_cores= (3)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 4  list_cores= (4)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 5  list_cores= (5)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 6  list_cores= (6)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 7  list_cores= (7)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 8  list_cores= (8)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 9  list_cores= (9)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 10  list_cores= (10)  num_devices= 1  gpu_id= 0\n\nTo affinity and beyond!! nname= x1922c1s1b0n0  rnk= 11  list_cores= (11)  num_devices= 1  gpu_id= 0\n
        "},{"location":"aurora/compiling-and-linking/aurora-example-program-makefile/#gpu-sycl-example","title":"GPU SYCL Example","text":"

        A simple SYCL offload example is available here. Compilation proceeds similar to the above examples except for the compiler flags enabling GPU offload.

        mpicxx -std=c++17 -fsycl -fsycl-targets=spir64 main.cpp\n

        Note, this particular example makes use of the Level-Zero API and requires linking with -lze_loader, which is not something required of a typical SYCL application. Running the SYCL example using the affinity script binding MPI ranks to individual GPU tiles results in output like the following.

        $ make\n\n$ mpiexec -n 12 --ppn 12 --depth=1 --cpu-bind depth ./set_affinity_gpu_sunspot.sh ./hello_affinity\n\nNUM_OF_NODES= 1 TOTAL_NUM_RANKS= 12 RANKS_PER_NODE= 12 THREADS_PER_RANK= 1\nCOMMAND= mpiexec -n 12 --ppn 12 --depth=1 --cpu-bind depth ./set_affinity_gpu_sunspot.sh ./hello_affinity\n\n\u201cRANK= 0 LOCAL_RANK= 0 gpu= 0 tile= 0\u201d\n\u201cRANK= 1 LOCAL_RANK= 1 gpu= 0 tile= 1\u201d\n\u201cRANK= 2 LOCAL_RANK= 2 gpu= 1 tile= 0\u201d\n\u201cRANK= 3 LOCAL_RANK= 3 gpu= 1 tile= 1\u201d\n\u201cRANK= 4 LOCAL_RANK= 4 gpu= 2 tile= 0\u201d\n\u201cRANK= 5 LOCAL_RANK= 5 gpu= 2 tile= 1\u201d\n\u201cRANK= 6 LOCAL_RANK= 6 gpu= 3 tile= 0\u201d\n\u201cRANK= 7 LOCAL_RANK= 7 gpu= 3 tile= 1\u201d\n\u201cRANK= 8 LOCAL_RANK= 8 gpu= 4 tile= 0\u201d\n\u201cRANK= 9 LOCAL_RANK= 9 gpu= 4 tile= 1\u201d\n\u201cRANK= 10 LOCAL_RANK= 10 gpu= 5 tile= 0\u201d\n\u201cRANK= 11 LOCAL_RANK= 11 gpu= 5 tile= 1\u201d\n\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 0  list_cores= (0)  num_devices= 1  gpu_uuid=  01000000-0000-0000-dbb1-2f985946b0dd\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 1  list_cores= (1)  num_devices= 1  gpu_uuid=  02000000-0000-0000-dbb1-2f985946b0dd\n\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 2  list_cores= (2)  num_devices= 1  gpu_uuid=  01000000-0000-0000-9d4c-a3a038130bd2\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 3  list_cores= (3)  num_devices= 1  gpu_uuid=  02000000-0000-0000-9d4c-a3a038130bd2\n\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 4  list_cores= (4)  num_devices= 1  gpu_uuid=  01000000-0000-0000-f684-455a4554b231\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 5  list_cores= (5)  num_devices= 1  gpu_uuid=  02000000-0000-0000-f684-455a4554b231\n\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 6  list_cores= (6)  num_devices= 1  gpu_uuid=  01000000-0000-0000-d04a-9a289a53274e\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 7  list_cores= (7)  num_devices= 1  gpu_uuid=  02000000-0000-0000-d04a-9a289a53274e\n\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 8  list_cores= (8)  num_devices= 1  gpu_uuid=  01000000-0000-0000-a178-e2f3a2a0df2b\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 9  list_cores= (9)  num_devices= 1  gpu_uuid=  02000000-0000-0000-a178-e2f3a2a0df2b\n\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 10  list_cores= (10)  num_devices= 1  gpu_uuid=  01000000-0000-0000-1b72-105049dfed26\nTo affinity and beyond!! nname= x1922c2s6b0n0  rnk= 11  list_cores= (11)  num_devices= 1  gpu_uuid=  02000000-0000-0000-1b72-105049dfed26\n

        Upon carefully comparing the uuids from each rank, once can see the first field distinguishing the 1st or 2nd tile on a GPU and the last two fields distinguishing the 6 GPUs on a compute node. If the affinity script was not used for binding MPI ranks to GPUs, then each MPI rank would report uuids for all GPUs like in the following.

        To affinity and beyond!! nname= x1922c2s6b0n0  rnk= 0  list_cores= (0)  num_devices= 6  gpu_uuid=  00000000-0000-0000-dbb1-2f985946b0dd 00000000-0000-0000-9d4c-a3a038130bd2 00000000-0000-0000-f684-455a4554b231 00000000-0000-0000-d04a-9a289a53274e 00000000-0000-0000-a178-e2f3a2a0df2b 00000000-0000-0000-1b72-105049dfed26\n
        "},{"location":"aurora/compiling-and-linking/aurora-example-program-makefile/#gpu-opencl-example","title":"GPU OpenCL Example","text":"

        A simple OpenCL example is available here. The include and lib directories for the OpenCL headers and libraries are in the default environment. One simply needs to link the application against -lOpenCL.

        mpicxx main.cpp -lOpenCL\n

        This simple example can be run on a single tile of an Aurora GPU as follows.

        $ export XE_AFFINITY_MASK=0.0\n$ ./vecadd\nRunning on GPU!\nUsing double-precision\n\n    CL_DEVICE_NAME: Intel(R) Data Center GPU Max 1550\n    CL_DEVICE_VERSION: OpenCL 3.0 NEO \n    CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 \n    CL_DEVICE_MAX_COMPUTE_UNITS: 896\n    CL_DEVICE_MAX_CLOCK_FREQUENCY: 1600\n    CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024\n\nResult is CORRECT!! :)\n
        "},{"location":"aurora/compiling-and-linking/aurora-programming-models/","title":"Aurora Programming Models","text":"

        The software environment on Aurora supports several parallel programming models targeting the CPUs and GPUs.

        "},{"location":"aurora/compiling-and-linking/aurora-programming-models/#cpu-parallel-programming-models","title":"CPU Parallel Programming Models","text":"

        The Aurora MPICH compiler wrappers mpicc, mpicxx, and mpifort are recommended for MPI applications to be built using the oneAPI compilers. A summary of available CPU parallel programming models and relevant compiler flags is show below. Users are encouraged to review the corresponding man pages and documentation.

        Programming Model oneAPI OpenMP -fopenmp

        Higher-level programming models such as Kokkos and Raja may also be used for CPU programming on Aurora.

        "},{"location":"aurora/compiling-and-linking/aurora-programming-models/#gpu-programming-models","title":"GPU Programming Models","text":"

        A summary of available GPU programming models and relevant compiler flags is shown below for compilers that generate offloadable code. Two modes of compilation are currently available with the oneAPI compilers: Just-in-Time (JIT) and Ahead-of-Time (AoT). With AOT compilation, flags for specifying the backend are only needed when linking the application. Users are encouraged to review the corresponding man pages and documentation.

        Programming Model oneAPI (JIT) oneAPI (AoT) OpenCL -- N/A OpenMP -fiopenmp -fopenmp-targets=spir64 -fiopenmp -fopenmp-targets=spir64_gen -Xopenmp-target-backend=spir64_gen \"-device 12.60.7\" SYCL --intel -fsycl -fsycl-targets=spir64 --intel -fsycl -fsycl-targets=spir64_gen -Xsycl-target-backend \"-device 12.60.7\"

        The oneAPI compiler flags For some build systems (e.g. cmake), it may be necessary to use the backslash character to escape the double quotes when specifying the device in AoT builds.

        -Xopenmp-target-backend=spir64_gen \\\"-device 12.60.7\\\"\n

        OpenCL is supported, but does not require specific compiler flags per-se as the offloaded kernels are JIT-compiled. One does need to link against the OpenCL library -lOpenCL. Abstraction programming models, such as Kokkos, can be built on top of these programming models.

        "},{"location":"aurora/compiling-and-linking/cce-compilers-aurora/","title":"CCE Compilers on Polaris","text":""},{"location":"aurora/compiling-and-linking/compiling-and-linking-overview/","title":"Compiling and Linking Overview","text":""},{"location":"aurora/compiling-and-linking/compiling-and-linking-overview/#compiling-on-aurora-login-and-compute-nodes","title":"Compiling on Aurora Login and Compute Nodes","text":"

        If your build system does not require GPUs for the build process, compilation of GPU-accelerated codes is generally expected to work well on the Aurora login nodes. If your build system does require GPUs, then currently that must be done on the compute nodes either via an interactive or batch job submission. Doing this interactively in a single-node job may be the preferred route as it also provides opportunity to quickly test the executable.

        "},{"location":"aurora/compiling-and-linking/compiling-and-linking-overview/#filesystem","title":"Filesystem","text":"

        It is helpful to realize that currently there is a single temporary filesystem gecko mounted on the Aurora login and compute nodes available to users, where both home and project spaces reside. It is important to realize that this filesystem is not backed up and users should take care to retain copies of important files (e.g. local resources or ALCF's grand and eagle filesystems).

        "},{"location":"aurora/compiling-and-linking/compiling-and-linking-overview/#oneapi-programming-environment","title":"OneAPI Programming Environment","text":"

        The oneAPI programming environment is currently the single environment for building and running software to maximally use the available hardware resources. The oneAPI environment is loaded by default for users and is principally defined by the following set of modules and related variants.

        • oneapi
        • intel_compute_runtime
        • mpich

        Additional modules loading GNU CPU compilers and parallel application launch support (e.g. libfabric and cray-pals) are also provided in the default environment. The oneAPI environment provides C, C++, and Fortran compilers and associated MPICH MPI wrappers for building applications targeting CPUs and GPUs based on the OpenMP, SYCL, and OpenCL programming models.

        • mpicc - C Compiler
        • mpicxx - C++ Compiler (a.k.a mpic++)
        • mpifort - Fortran Compiler (a.k.a mpif70 & mpif90)
        "},{"location":"aurora/compiling-and-linking/compiling-and-linking-overview/#compilers-provided-by-cray-programming-environments","title":"Compilers provided by Cray Programming Environments","text":"

        The PrgEnv-gnu and PrgEnv-cray Cray programming environments are currently available as modules for users that neede them. The compilers provided by these environments do not currently support Intel GPUs and should only be used, if at all, for CPU-only code.

        "},{"location":"aurora/compiling-and-linking/compiling-and-linking-overview/#mixed-cc-fortran-applications","title":"Mixed C/C++ & Fortran Applications","text":"

        For applications consisting of a mix of programming languages that use MPI, it is important to use the same Fortran compiler for building the application as was used to build MPI because of mpi.mod (and similar) incompatibilities.

        "},{"location":"aurora/compiling-and-linking/compiling-and-linking-overview/#additional-software-and-build-tools","title":"Additional software and build tools","text":"

        Additional software is available on Aurora as modules that are not currently available in the default environment to reduce load on the filesystem. Once a shell has opened on the login or compute nodes (i.e. inside of a job script), users can access additional software in /soft/modulefiles.

        $ module use /soft/modulefiles\n

        This will make available builds of the frameworks, Kokkos, additional MPICH builds, and also some spack modules, which provide access to additional tools. The module use /soft/modulefiles command only needs to be executed once per session, but is repeated below for clarity.

        $ module use /soft/modulefiles\n$ module load spack-pe-gcc\n

        Looking at the available modules (module avail) now in this expanded list one can find additional build tools, such as cmake.

        $ module use /soft/modulefiles\n$ module load spack-pe-gcc cmake\n$ cmake --version\ncmake version 3.26.4\n
        "},{"location":"aurora/compiling-and-linking/continuous-integration-aurora/","title":"Continuous Integration Aurora","text":""},{"location":"aurora/compiling-and-linking/gnu-compilers-aurora/","title":"GNU Compilers Polaris","text":""},{"location":"aurora/compiling-and-linking/llvm-compilers-aurora/","title":"LLVM Compilers","text":"

        LLVM Compilers Aurora

        "},{"location":"aurora/compiling-and-linking/oneapi-compilers-aurora/","title":"oneAPI Compilers","text":"

        Intel oneAPI Compilers Aurora

        "},{"location":"aurora/data-management/daos/daos-overview/","title":"DAOS Overview","text":"

        Placeholder

        "},{"location":"aurora/data-management/lustre/gecko/","title":"Gecko Filesystem","text":""},{"location":"aurora/data-management/lustre/gecko/#data-transfer","title":"Data Transfer","text":"

        Currently, scp and SFTP are the only ways to transfer data to/from Aurora.

        "},{"location":"aurora/data-management/lustre/gecko/#transferring-files-from-non-alcf-systems","title":"Transferring files from non-ALCF systems","text":"

        As an expedient for initiating ssh sessions to Aurora login nodes via the bastion indirect nodes, and to enable scp from remote (non ALCF) hosts to Aurora login nodes, follow these steps:

        1. Create SSH keys on the laptop/desktop/remote machine. See \"Creating SSH Keys\" section on this page.
        2. Add the lines listed below to your ~/.ssh/config file on the remote host. That is, you should do this on your laptop/desktop, from which you are initiating ssh login sessions to Aurora via bastion, and on other non-ALCF host systems from which you want to copy files to Aurora login nodes using scp.
        $ cat ~/.ssh/config\n\nHost *.aurora.alcf.anl.gov aurora.alcf.anl.gov\n    ProxyCommand ssh <your_ALCF_username>@bastion.alcf.anl.gov -q -W %h:%p\n
        1. Copy the public key (*.pub) from ~/.ssh folder on the remote machine to ~/.ssh/authorized_keys file on Aurora (login.aurora.alcf.anl.gov)

        When you use an SSH proxy, it takes the authentication mechanism from the local host and applies it to the farthest-remote host, while prompting you for the \u201cmiddle host\u201d separately. So, when you run the ssh @login.aurora.alcf.anl.gov command on your laptop/desktop, you'll be prompted for two ALCF authentication codes - first the Mobilepass+ or Cryptocard passcode for the bastion, and then the SSH passphrase for Aurora. Likewise, when you run scp from a remote host to copy files to Aurora login nodes, you'll be prompted for two ALCF authentication codes codes - first the Mobilepass+ or Cryptocard passcode and then the SSH passphrase."},{"location":"aurora/data-management/lustre/gecko/#transferring-files-from-other-alcf-systems","title":"Transferring files from other ALCF systems","text":"

        With the bastion pass-through nodes currently used to access both Sunspot and Aurora, users will find it helpful to modify their .ssh/config files on Aurora appropriately to facilitate transfers to Aurora from other ALCF systems. These changes are similar to what Sunspot users may have already implemented. From an Aurora login-node, this readily enables one to transfer files from Sunspot's gila filesystem or one of the production filesystems at ALCF (home, grand, and eagle) mounted on an ALCF system's login node. With the use of ProxyJump below, entering the MobilePass+ or Cryptocard passcode twice will be needed (once for bastion and once for the other resource). A simple example shows the .ssh/config entries for Polaris and the scp command for transferring from Polaris:

        $ cat .ssh/config\nknight@aurora-uan-0009:~> cat .ssh/config\nHost bastion.alcf.anl.gov\n    User knight\n\nHost polaris.alcf.anl.gov\n    ProxyJump bastion.alcf.anl.gov\n    DynamicForward 3142\n    user knight\n
        knight@aurora-uan-0009:~> scp knight@polaris.alcf.anl.gov:/grand/catalyst/proj-shared/knight/test.txt ./\n---------------------------------------------------------------------------\n                            Notice to Users\n...\n[Password:\n---------------------------------------------------------------------------\n                            Notice to Users\n... \n[Password:\nknight@aurora-uan-0009:~> cat test.txt \nfrom_polaris grand\n
        "},{"location":"aurora/data-science/julia/","title":"Julia on Aurora","text":""},{"location":"aurora/data-science/python/","title":"Python on Aurora","text":""},{"location":"aurora/data-science/python/#framework-modules","title":"Framework Modules","text":"

        Frameworks on Aurora can be loaded into a users environment by loading the frameworks module as follows. The conda environment loaded with this module makes available TensorFlow, Horovod, and Pytorch with Intel extensions and optimizations. The following commands can be used both from an interactive session on a terminal and on a batch job script.

        Note that the framework modules may load a different oneAPI than the default module. The frameworks are updated on approximately a quarterly cadence at the moment.

        module use /soft/modulefiles\nmodule load frameworks\n
        These pre-built conda environments come with GPU-supported builds of PyTorch and TensorFlow. Both of these frameworks have Horovod support for multi-node calculations. Many other commonly used Python modules are available through these modules.

        For more information on pytorch and tensorflow please see their respective pages:

        • PyTorch
        • TensorFlow

        From a login node we can do the following commands to list the available modules:

        module load /soft/modulefiles/\nmodule avail\n
        This shows a list of avilable modules including the frameworks module. There are many frameworks modules available. The latest frameworks release could be used using:

        $ module load frameworks/2023.12.15.001\n\nThe following have been reloaded with a version change:\n  1) gcc/11.2.0 => gcc/12.2.0     2) intel_compute_runtime/release/agama-devel-551 => intel_compute_runtime/release/stable-736.25\n\n$ which python3\n/soft/datascience/aurora_nre_models_frameworks-2024.0/bin/python3\n\n$ which python\n/soft/datascience/aurora_nre_models_frameworks-2024.0/bin/python\n
        At the time of writing this module contains Python 3.9.18. Future modules will contain updated versions of Python, PyTorch, TensorFlow, etc.

        While the shared Anaconda environment encapsulated in the module contains many of the most commonly used Python libraries for our users, you may still encounter a scenario in which you need to extend the functionality of the environment (i.e. install additional packages)

        You can use a virtual environment to extend/modify an existing frameworks module.

        "},{"location":"aurora/data-science/python/#virtual-environments-via-venv","title":"Virtual environments via venv","text":"

        Creating your own (empty) virtual Python environment in a directory that is writable to you is straightforward:

        python3 -m venv /path/to/new/virtual/environment\n

        This creates a new folder that is fairly lightweight folder (<20 MB) with its own Python interpreter where you can install whatever packages you'd like. First, you must activate the virtual environment to make this Python interpreter the default interpreter in your shell session. By default, this environment will not have access to the framework packages but instead will be empty.

        You activate the new environment whenever you want to start using it via running the activate script in that folder:

        source /path/to/new/virtual/environment/bin/activate\n

        In many cases, you do not want an empty virtual environment, but instead want to start from the conda base environment's installed packages, only adding and/or changing a few modules.

        To extend the base Anaconda environment with venv (e.g. my_env in the current directory) and inherit the base enviroment packages, one can use the --system-site-packages flag:

        module use /soft/modulefiles/\nmodule load frameworks/2023.12.15.001\npython3 -m venv --system-site-packages my_env\nsource my_env/bin/activate\n\n# Install additional packages here\n
        You can always retroactively change the --system-site-packages flag state for this virtual environment by editing my_env/pyvenv.cfg and changing the value of the line include-system-site-packages = false.

        To install a different version of a package that is already installed in the base environment, you can use:

        pip install --ignore-installed ... # or -I\n
        The shared base environment is not writable, so it is impossible to remove or uninstall packages from it. The packages installed with the above pip command should shadow those installed in the base environment.

        "},{"location":"aurora/data-science/python/#using-pip-install-user-not-recommended","title":"Using pip install --user (not recommended)","text":"

        With the conda environment setup, one can install common Python modules using pip install --users <module-name> which will install packages in $PYTHONUSERBASE/lib/pythonX.Y/site-packages. The $PYTHONUSERBASE environment variable is automatically set when you load the base conda module, and is equal to /home/$USER/.local/aurora/frameworks/2023.12.15.001

        Note, Python modules installed this way that contain command line binaries will not have those binaries automatically added to the shell's $PATH. To manually add the path:

        export PATH=$PYTHONUSERBASE/bin:$PATH\n
        Be sure to remove this location from $PATH if you deactivate the base Anaconda environment or unload the module.

        Cloning the Anaconda environment, or using venv are both more flexible and transparent when compared to --user installs.

        "},{"location":"aurora/data-science/applications/gpt-neox/","title":"gpt-neox","text":"

        Instruction for gpt-neox on Aurora

        "},{"location":"aurora/data-science/containers/containers/","title":"Containers on Aurora","text":""},{"location":"aurora/data-science/frameworks/deepspeed/","title":"DeepSpeed on Aurora","text":""},{"location":"aurora/data-science/frameworks/jax/","title":"Jax on Aurora","text":""},{"location":"aurora/data-science/frameworks/libtorch/","title":"LibTorch C++ Library","text":"

        LibTorch is a C++ library for Torch, with many of the API that are available in PyTorch. Users can find more information on the PyTorch documentation. This is useful to integrate the Torch ML framework into traditional HPC simulation codes and therefore enable training and inferecing of ML models. On Aurora, the Intel Extension for PyTorch (IPEX) library is needed to access the Max 1550 GPU, which have the device name kXPU in LibTorch. During compilation, Intel optimizations will be activated automatically once the IPEX dynamic library is linked.

        "},{"location":"aurora/data-science/frameworks/libtorch/#environment-setup","title":"Environment Setup","text":"

        To use LibTorch on Aurora, load the ML frameworks module

        module use /soft/modulefiles\nmodule load frameworks/2023.10.15.001\n
        which will also load the consistent oneAPI SDK and cmake.

        "},{"location":"aurora/data-science/frameworks/libtorch/#torch-and-ipex-libraries","title":"Torch and IPEX libraries","text":"

        With the ML frameworks module loaded as shown above, run

        python -c 'import torch; print(torch.__path__[0])'\npython -c 'import torch;print(torch.utils.cmake_prefix_path)'\n
        to find the path to the Torch libraries, include files, and CMake files.

        For the path to the IPEX dynamic library, run

        python -c 'import torch; print(torch.__path__[0].replace(\"torch\",\"intel_extension_for_pytorch\"))'\n

        "},{"location":"aurora/data-science/frameworks/libtorch/#model-inferencing-using-the-torch-api","title":"Model Inferencing Using the Torch API","text":"

        This example shows how to perform inference on the ResNet50 model using only the LibTorch API. First, get a jit-traced version of the model running resnet50_trace.py below.

        import torch\nimport torchvision\nimport intel_extension_for_pytorch as ipex\nfrom time import perf_counter\n\ndevice = 'xpu'\n\nmodel = torchvision.models.resnet50()\nmodel.to(device)\nmodel.eval()\n\ndummy_input = torch.rand(1, 3, 224, 224).to(device)\n\nmodel_jit = torch.jit.trace(model, dummy_input)\ntic = perf_counter()\npredictions = model_jit(dummy_input)\ntoc = perf_counter()\nprint(f\"Inference time: {toc-tic}\")\n\ntorch.jit.save(model_jit, f\"resnet50_jit.pt\")\n

        Then, use the source code in inference-example.cpp

        #include <torch/torch.h>\n#include <torch/script.h>\n#include <iostream>\n\nint main(int argc, const char* argv[]) {\n    torch::jit::script::Module model;\n    try {\n        model = torch::jit::load(argv[1]);\n        std::cout << \"Loaded the model\\n\";\n    }\n    catch (const c10::Error& e) {\n        std::cerr << \"error loading the model\\n\";\n        return -1;\n    }\n    // Upload model to GPU\n    model.to(torch::Device(torch::kXPU));\n    std::cout << \"Model offloaded to GPU\\n\\n\";\n\n    auto options = torch::TensorOptions()\n                      .dtype(torch::kFloat32)\n                      .device(torch::kXPU);\n    torch::Tensor input_tensor = torch::rand({1,3,224,224}, options);\n    assert(input_tensor.dtype() == torch::kFloat32);\n    assert(input_tensor.device().type() == torch::kXPU);\n    std::cout << \"Created the input tesor on GPU\\n\";\n\n    torch::Tensor output = model.forward({input_tensor}).toTensor();\n    std::cout << \"Performed inference\\n\\n\";\n\n    std::cout << \"Predicted tensor is : \\n\";\n    std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/10) << '\\n';\n\n    return 0;\n}\n

        and the CMakeLists.txt file

        cmake_minimum_required(VERSION 3.0 FATAL_ERROR)\nproject(inference-example)\n\nfind_package(Torch REQUIRED)\nset(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS} -Wl,--no-as-needed\")\n\nadd_executable(inference-example inference-example.cpp)\ntarget_link_libraries(inference-example \"${TORCH_LIBRARIES}\" \"${INTEL_EXTENSION_FOR_PYTORCH_PATH}/lib/libintel-ext-pt-gpu.so\")\n\nset_property(TARGET inference-example PROPERTY CXX_STANDARD 14)\n

        to build the inference example.

        Finally, create a build directory with mkdir build; cd build and execute the doConfig.sh script below

        #!/bin/bash\n\ncmake \\\n    -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` \\\n    -DINTEL_EXTENSION_FOR_PYTORCH_PATH=`python -c 'import torch; print(torch.__path__[0].replace(\"torch\",\"intel_extension_for_pytorch\"))'` \\\n    ..\n\nmake\n./inference-example ../resnet50_jit.pt\n

        "},{"location":"aurora/data-science/frameworks/pytorch/","title":"PyTorch on Aurora","text":"

        PyTorch is a popular, open source deep learning framework developed and released by Facebook. The PyTorch home page, has more information about PyTorch, which you can refer to. For troubleshooting on Aurora, please contact support@alcf.anl.gov.

        "},{"location":"aurora/data-science/frameworks/pytorch/#installation-on-aurora","title":"Installation on Aurora","text":"

        PyTorch is already installed on Aurora with GPU support and available through the frameworks module. To use it from a compute node, please load the following modules:

        module use /soft/modulefiles/\nmodule load frameworks/2023.12.15.001\n
        Then you can import PyTorch as usual, the following is an output from the frameworks/2023.12.15.001 module

        >>> import torch\n>>> torch.__version__\n'2.0.1a0+cxx11.abi'\n
        A simple but useful check could be to use PyTorch to get device information on a compute node. You can do this the following way:

        import torch\nimport intel_extension_for_pytorch as ipex\n\nprint(f\"GPU availability: {torch.xpu.is_available()}\")\nprint(f'Number of tiles = {torch.xpu.device_count()}')\ncurrent_tile = torch.xpu.current_device()\nprint(f'Current tile = {current_tile}')\nprint(f'Curent device ID = {torch.xpu.device(current_tile)}')\nprint(f'Device name = {torch.xpu.get_device_name(current_tile)}')\n

        # output of the above code block\n\nGPU availability: True\nNumber of tiles = 12\nCurrent tile = 0\nCurent device ID = <intel_extension_for_pytorch.xpu.device object at 0x1540a9f25790>\nDevice name = Intel(R) Data Center GPU Max 1550\n
        Note that, along with importing the torch module, you need to import the intel_extension_for_pytorch module. The default mode in ipex for counting the available devices on a compute node treat each tile as a device, hence the code block above is expected to output 12. If you want to get the number of \"cards\" as an output, you may declare the following environment variable:

        export IPEX_TILE_AS_DEVICE=0\n
        With this environmental variable, we expect the output to be 6 -- the number of GPUs available on an Aurora compute node. All the API calls involving torch.cuda, should be replaced with torch.xpu, as shown in the above example.

        Important: It is highly recommended to import intel_extension_for_pytorch right after import torch, prior to importing other packages, (from Intel's getting started doc).

        Intel extension for PyTorch has been made publicly available as an open-source project at Github

        Please consult the following resources for additional details and useful tutorials.

        • PyTorch's webpage for Intel extension
        • Intel's Github repository
        • Intel's Documentation
        "},{"location":"aurora/data-science/frameworks/pytorch/#pytorch-best-practices-on-aurora","title":"PyTorch Best Practices on Aurora","text":""},{"location":"aurora/data-science/frameworks/pytorch/#single-device-performance","title":"Single Device Performance","text":"

        To expose one particular device out of the 6 available on a compute node, this environmental variable should be set

        export ZE_AFFINITY_MASK=0.0,0.1\n\n# The values taken by this variable follows the syntax `Device.Sub-device`\n
        In the example given above, an application is targeting the Device:0 and Sub-devices: 0, 1, i.e. the two tiles of the GPU:0. This is particularly important in setting a performance benchmarking baseline.

        More information and details are available through the Level Zero Specification Documentation - Affinity Mask

        "},{"location":"aurora/data-science/frameworks/pytorch/#single-node-performance","title":"Single Node Performance","text":"

        When running PyTorch applications, we have found the following practices to be generally, if not universally, useful and encourage you to try some of these techniques to boost performance of your own applications.

        1. Use Reduced Precision. Reduced Precision is available on Intel Max 1550 and is supported with PyTorch operations. In general, the way to do this is via the PyTorch Automatic Mixed Precision package (AMP), as descibed in the mixed precision documentation. In PyTorch, users generally need to manage casting and loss scaling manually, though context managers and function decorators can provide easy tools to do this.

        2. PyTorch has a JIT module as well as backends to support op fusion, similar to TensorFlow's tf.function tools. Please see TorchScript for more information.

        3. torch.compile will be available through the next framework release.

        "},{"location":"aurora/data-science/frameworks/pytorch/#multi-gpu-multi-node-scale-up","title":"Multi-GPU / Multi-Node Scale Up","text":"

        PyTorch is compatible with scaling up to multiple GPUs per node, and across multiple nodes. Good performance with PyTorch has been seen with both DDP and Horovod. For details, please see the Horovod documentation or the Distributed Data Parallel documentation. Some of the Aurora specific details might be helpful to you:

        "},{"location":"aurora/data-science/frameworks/pytorch/#environmental-variables","title":"Environmental Variables","text":"

        The following environmental variables should be set on the batch submission script (PBSPro script) in the case of attempting to run beyond 16 nodes.

        # This is a fix for running over 16 nodes:\nexport FI_CXI_DEFAULT_CQ_SIZE=131072\nexport FI_CXI_OVFLOW_BUF_SIZE=8388608\nexport FI_CXI_CQ_FILL_PERCENT=20\n\nexport FI_LOG_LEVEL=warn\n#export FI_LOG_PROV=tcp\nexport FI_LOG_PROV=cxi\n\nexport MPIR_CVAR_ENABLE_GPU=0\n# This is to disable certain GPU optimizations like the use of XeLinks between\n# GPUs, collectives with GPU-placed data etc., in order to reduce `MPI_Init`\n# overheads. Benefits are application dependent.\nexport CCL_KVS_GET_TIMEOUT=600\n

        In order to run an application with TF32 precision type, one must set the following environmental parameter:

        export IPEX_FP32_MATH_MODE=TF32\n
        This allows calculations using TF32 as opposed to the default FP32, and done through intel_extension_for_pytorch module.

        "},{"location":"aurora/data-science/frameworks/pytorch/#cpu-affinity","title":"CPU Affinity","text":"

        The CPU affinity should be set manually through mpiexec. You can do this the following way:

        export CPU_BIND=\"verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96\"\nmpiexec ... --cpu-bind=${CPU_BIND}\n

        These bindings should be use along with the following oneCCL and Horovod environment variable settings:

        HOROVOD_THREAD_AFFINITY=\"4,12,20,28,36,44,56,64,72,80,88,96\"\nCCL_WORKER_AFFINITY=\"5,13,21,29,37,45,57,65,73,81,89,97\"\n

        When running 12 ranks per node with these settings the frameworks use 3 cores, with Horovod tightly coupled with the frameworks using one of the 3 cores, and oneCCL using a separate core for better performance, eg. with rank 0 the frameworks would use cores 2,3,4, Horovod would use core 4, and oneCCL would use core 5.

        Each workload may perform better with different settings. The criteria for choosing the cpu bindings are:

        • Binding for GPU and NIC affinity \u2013 To bind the ranks to cores on the proper socket or NUMA nodes.
        • Binding for cache access \u2013 This is the part that will change per application and some experimentation is needed.

        Important: This setup is a work in progress, and based on observed performance. The recommended settings are likely to changed with new framework releases.

        "},{"location":"aurora/data-science/frameworks/pytorch/#distributed-training","title":"Distributed Training","text":"

        Distributed training with PyTorch on Aurora is facilitated through both DDP and Horovod. DDP training is accelerated using oneAPI Collective Communications Library Bindings for Pytorch (oneCCL Bindings for Pytorch). The extension supports FP32 and BF16 data types. More detailed information and examples are available at the Intel oneCCL repo, formerly known as torch-ccl.

        The key steps in performing distributed training using oneccl_bindings_for_pytorch are the following:

        import os\nimport torch\nimport intel_extension_for_pytorch as ipex\nimport torch.distributed as dist\nimport torch.nn as nn\nfrom torch.nn.parallel import DistributedDataParallel as DDP\nimport oneccl_bindings_for_pytorch as torch_ccl\n\n...\n\n# perform the necessary transforms\n# set up the training data set\n# set up the data loader\n# set the master address, ports, world size and ranks through os.environ module\n\n...\n# Initialize the process group for distributed training with oneCCL backend\n\ndist.init_process_group(backend='ccl', ... # arguments)\n\nmodel = YOUR_MODEL().to(device)         # device = 'cpu' or 'xpu:{os.environ['MPI_LOCALRANKID']}'\ncriterion = torch.nn. ... .to(device)   # Choose a loss function \noptimizer = torch.optim. ...            # Choose an optimizer\n# model.train()                         # Optional, model dependent\n\n# Off-load the model to ipex for additional optimization \nmodel, optimizer = ipex.optimize(model, optimizer=optimizer)\n\n# Initialize DDP with your model for distributed processing\nif dist.get_world_size() > 1:\n     model = DDP(model, device_ids=[device] if (device != 'cpu') else None)\n\nfor ...\n    # perform the training loop\n

        A detailed example of the full procedure with a toy model is given here:

        • Intel's oneCCL demo
        "},{"location":"aurora/data-science/frameworks/pytorch/#a-simple-job-script","title":"A Simple Job Script","text":"

        Below we give an example job script:

        #!/bin/bash -l\n#PBS -l select=512                              # selecting 512 Nodes\n#PBS -l place=scatter\n#PBS -l walltime=1:59:00\n#PBS -q EarlyAppAccess                          # a specific queue\n#PBS -A Aurora_deployment                       # project allocation\n#PBS -l filesystems=home                        # specific filesystem, can be a list separated by :\n#PBS -k doe\n#PBS -e /home/$USER/path/to/errordir            \n#PBS -o /home/$USER/path/to/outdir              # path to `stdout` or `.OU` files\n#PBS -j oe                                      # output and error placed in the `stdout` file\n#PBS -N a.name.for.the.job\n\n#####################################################################\n# This block configures the total number of ranks, discovering\n# it from PBS variables.\n# 12 Ranks per node, if doing rank/tile\n#####################################################################\n\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=12\nlet NRANKS=${NNODES}*${NRANKS_PER_NODE}\n\n# This is a fix for running over 16 nodes:\nexport FI_CXI_DEFAULT_CQ_SIZE=131072\nexport FI_CXI_OVFLOW_BUF_SIZE=8388608\nexport FI_CXI_CQ_FILL_PERCENT=20\n# These are workaround for a known Cassini overflow issue\n\nexport FI_LOG_LEVEL=warn\n#export FI_LOG_PROV=tcp\nexport FI_LOG_PROV=cxi\n# These allow for logging from a specific provider (libfabric)\n\nexport MPIR_CVAR_ENABLE_GPU=0 \nexport CCL_KVS_GET_TIMEOUT=600\n\n#####################################################################\n# APPLICATION Variables that make a performance difference\n#####################################################################\n\n# Channels last is faster for pytorch, requires code changes!\n# More info here:\n# https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/features.html#channels-last\n# https://pytorch.org/tutorials/recipes/recipes/intel_extension_for_pytorch.html\nDATA_FORMAT=\"channels_last\"\n\n#####################################################################\n# FRAMEWORK Variables that make a performance difference \n#####################################################################\n\n# Toggle tf32 on (or don't):\nexport IPEX_FP32_MATH_MODE=TF32\n\n#####################################################################\n# End of perf-adjustment section\n#####################################################################\n\n#####################################################################\n# Environment set up, using the latest frameworks drop\n#####################################################################\n\nmodule use /soft/modulefiles\nmodule load frameworks/2023.12.15.001\n\nexport NUMEXPR_MAX_THREADS=1\n# This is to resolve an issue due to a package called \"numexpr\". \n# It sets the variable \n# 'numexpr.nthreads' to available number of threads by default, in this case \n# to 208. However, the 'NUMEXPR_MAX_THREADS' is also set to 64 as a package \n# default. The solution is to either set the 'NUMEXPR_NUM_THREADS' to less than \n# or equal to '64' or to increase the 'NUMEXPR_MAX_THREADS' to the available \n# number of threads. Both of these variables can be set manually.\n\n#####################################################################\n# End of environment setup section\n#####################################################################\n\n#####################################################################\n# JOB LAUNCH\n######################################################################\n\nexport CCL_LOG_LEVEL=\"WARN\"\nexport CPU_BIND=\"verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96\"\nHOROVOD_THREAD_AFFINITY=\"4,12,20,28,36,44,56,64,72,80,88,96\"\nCCL_WORKER_AFFINITY=\"5,13,21,29,37,45,57,65,73,81,89,97\"\n\nulimit -c 0\n\n# Launch the script\nmpiexec -np ${NRANKS} -ppn ${NRANKS_PER_NODE} \\\n--cpu-bind ${CPU_BIND} \\\npython path/to/application.py\n
        "},{"location":"aurora/data-science/frameworks/tensorflow/","title":"TensorFlow on Aurora","text":"

        TensorFlow is a popular, open-source deep learning framework developed and released by Google. The TensorFlow home page has more information about TensorFlow, which you can refer to. For trouble shooting on Polaris, please contact support@alcf.anl.gov.

        "},{"location":"aurora/data-science/frameworks/tensorflow/#installation-on-aurora","title":"Installation on Aurora","text":"

        TensorFlow is already pre-installed on Aurora, available in the frameworks module. To use it from a compute node, please do:

        module use /soft/modulefiles/\nmodule load frameworks/2023.12.15.001\n

        Then you can import TensorFlow as usual, the following is an output from the frameworks/2023.12.15.001 module:

        >>> import tensorflow as tf\n>>> tf.__version__\n'2.14.1'\n
        A simple but useful check could be to use TensorFlow to get device information on a compute node. You can do this the following way:

        >>> tf.config.list_physical_devices()\n[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), \nPhysicalDevice(name='/physical_device:XPU:0', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:1', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:2', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:3', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:4', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:5', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:6', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:7', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:8', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:9', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:10', device_type='XPU'), \nPhysicalDevice(name='/physical_device:XPU:11', device_type='XPU')]\n

        Note that, here tf.config return 12 tiles of 6 cards (the number of GPU resources on an Aurora compute node), and treat each tile as a device. The user can choose to set the environmental variable ZE_FLAT_DEVICE_HIERARCHY with appropriate values to achieve desired behavior, as described in the Level Zero Specification documentation. This environment variable is equivalent to the ITEX_TILE_AS_DEVICE, which is to be deprecated soon.

        Intel extension for TensorFLow is has been made publicly available as an open-source project at Github.

        Please consult the following resources for additional details and useful tutorials.

        • Intel's Documentation
        • Intel's Examples
        • Intel's ITEX Features Guide
        • Intel's Practice Guide
        "},{"location":"aurora/data-science/frameworks/tensorflow/#tensorflow-best-practices-on-aurora","title":"TensorFlow Best Practices on Aurora","text":""},{"location":"aurora/data-science/frameworks/tensorflow/#single-device-performance","title":"Single Device Performance","text":"

        To expose one particular device out of the 6 available on a compute node, this environmental variable should be set

        export ZE_AFFINITY_MASK=0.0,0.1\n\n# The values taken by this variable follows the syntax `Device.Sub-device`\n
        In the example given above, an application is targeting the Device:0 and Sub-devices: 0, 1, i.e. the two tiles of the GPU:0. This is particularly important in setting a performance benchmarking baseline.

        More information and details are available through Level Zero Specification Documentation - Affinity Mask

        "},{"location":"aurora/data-science/frameworks/tensorflow/#single-node-performance","title":"Single Node Performance","text":"

        When running TensorFlow applications, we have found the following practices to be generally, if not universally, useful and encourage you to try some of these techniques to boost performance of your own applications.

        "},{"location":"aurora/data-science/frameworks/tensorflow/#reduced-precision","title":"Reduced Precision","text":"

        Use Reduced Precision, whenever the application allows. Reduced Precision is available on Intel Max 1550 and is supported with TensorFlow operations. In general, the way to do this is via the tf.keras.mixed_precision Policy, as descibed in the mixed precision documentation Intel's extension for TensorFlow is fully compatible with the Keras mixed precision API in TensorFlow. It also provides an advanced auto mixed precision feature. For example, you can just set two environment variables to get the performance benefit from low-precision data type FP16/BF16 without changing the application code.

        export ITEX_AUTO_MIXED_PRECISION=1\nexport ITEX_AUTO_MIXED_PRECISION_DATA_TYPE=\"BFLOAT16\" # or \"FLOAT16\"\n
        If you use a custom training loop (and not keras.Model.fit), you will also need to apply loss scaling.

        "},{"location":"aurora/data-science/frameworks/tensorflow/#tensorflows-graph-api","title":"TensorFlow's graph API","text":"

        Use TensorFlow's graph API to improve efficiency of operations. TensorFlow is, in general, an imperative language but with function decorators like @tf.function you can trace functions in your code. Tracing replaces your python function with a lower-level, semi-compiled TensorFlow Graph. More information about the tf.function interface is available here. When possible, use jit_compile, but be aware of sharp bits when using tf.function: python expressions that aren't tensors are often replaced as constants in the graph, which may or may not be your intention.

        There is an experimental feature, which allows for aggressive fusion of kernels through oneDNN Graph API. Intel's extension for TensorFlow can offload performance critical graph partitions to oneDNN library to get more aggressive graph optimizations. It can be done by setting this environmental variable:

        export ITEX_ONEDNN_GRAPH=1\n
        This feature is experimental, and actively under development.

        "},{"location":"aurora/data-science/frameworks/tensorflow/#tf32-math-mode","title":"TF32 Math Mode","text":"

        The Intel Xe Matrix Extensions (Intel XMX) engines in Intel Max 1550 Xe-HPC GPUs natively support TF32 math mode. Through intel extension for tensorflow you can enable it by setting the following environmental variable:

        export ITEX_FP32_MATH_MODE=\"TF32\"\n
        "},{"location":"aurora/data-science/frameworks/tensorflow/#xla-compilation-plannedupcoming","title":"XLA Compilation (Planned/Upcoming)","text":"

        XLA is the Accelerated Linear Algebra library that is available in TensorFlow and critical in software like JAX. XLA will compile a tf.Graph object, generated with tf.function or similar, and perform optimizations like operation-fusion. XLA can give impressive performance boosts with almost no user changes except to set an environment variable TF_XLA_FLAGS=--tf_xla_auto_jit=2. If your code is complex, or has dynamically sized tensors (tensors where the shape changes every iteration), XLA can be detrimental: the overhead for compiling functions can be large enough to mitigate performance improvements. XLA is particularly powerful when combined with reduced precision, yielding speedups > 100% in some models.

        Intel provides initial intel GPU support for TensorFlow models with XLA acceleration through Intel Extension for OpenXLA. Full TensorFlow and PyTorch support is planned for development.

        "},{"location":"aurora/data-science/frameworks/tensorflow/#a-simple-example","title":"A simple example","text":"

        A simple example on how to use Intel GPU with TensorFlow is the following:

        import tensorflow as tf   # TensorFlow registers PluggableDevices here.\ntf.config.list_physical_devices()  # XPU device is visible to TensorFlow.\n\n#Section 1 Run implicitly\na = tf.random.normal(shape=[5], dtype=tf.float32)  # Runs on XPU.\nb = tf.nn.relu(a)         # Runs on XPU .\n\n#Section 2 Run with explicit device setting\nwith tf.device(\"/XPU:0\"):  # Users can also use 'with tf.device' syntax.\n  c = tf.nn.relu(a)        # Runs on XPU.\nwith tf.device(\"/CPU:0\"):\n  c = tf.nn.relu(a)        # Runs on CPU.\n\n#Section 3 Run with graph mode\n@tf.function  # Defining a tf.function\ndef run():\n  d = tf.random.uniform(shape=[100], dtype=tf.float32)\n  e = tf.nn.relu(d)\nrun()  # PluggableDevices also work with tf.function and graph mode. Runs on XPU\n
        "},{"location":"aurora/data-science/frameworks/tensorflow/#multi-gpu-multi-node-scale-up","title":"Multi-GPU / Multi-Node Scale Up","text":"

        TensorFlow is compatible with scaling up to multiple GPUs per node, and across multiple nodes. Good performance with tensorFlow has been seen with horovod in particular. For details, please see the Horovod documentation. Some Aurora specific details might be helpful to you.

        "},{"location":"aurora/data-science/frameworks/tensorflow/#environment-variables","title":"Environment Variables","text":"

        The following environmental variables should be set on the batch submission script (PBSPro script) in the case of attempting to run beyond 16 nodes.

        # This is a fix for running over 16 nodes:\nexport FI_CXI_DEFAULT_CQ_SIZE=131072\nexport FI_CXI_OVFLOW_BUF_SIZE=8388608\nexport FI_CXI_CQ_FILL_PERCENT=20\n\nexport FI_LOG_LEVEL=warn\n#export FI_LOG_PROV=tcp\nexport FI_LOG_PROV=cxi\n\nexport MPIR_CVAR_ENABLE_GPU=0\n# This is to disable certain GPU optimizations like the use of XeLinks between\n# GPUs, collectives with GPU-placed data etc., in order to reduce `MPI_Init`\n# overheads. Benefits are application dependent.\nexport CCL_KVS_GET_TIMEOUT=600\n
        "},{"location":"aurora/data-science/frameworks/tensorflow/#cpu-affinity","title":"CPU Affinity","text":"

        The CPU affinity should be set manually through mpiexec. You can do this the following way:

        export CPU_BIND=\"verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96\"\nmpiexec ... --cpu-bind=${CPU_BIND}\n

        These bindings should be use along with the following oneCCL and Horovod environment variable settings:

        HOROVOD_THREAD_AFFINITY=\"4,12,20,28,36,44,56,64,72,80,88,96\"\nCCL_WORKER_AFFINITY=\"5,13,21,29,37,45,57,65,73,81,89,97\"\n

        When running 12 ranks per node with these settings the frameworks use 3 cores, with Horovod tightly coupled with the frameworks using one of the 3 cores, and oneCCL using a separate core for better performance, eg. with rank 0 the frameworks would use cores 2,3,4, Horovod would use core 4, and oneCCL would use core 5.

        Each workload may perform better with different settings. The criteria for choosing the cpu bindings are:

        • Binding for GPU and NIC affinity \u2013 To bind the ranks to cores on the proper socket or NUMA nodes.
        • Binding for cache access \u2013 This is the part that will change per application and some experimentation is needed.

        Important: This setup is a work in progress, and based on observed performance. The recommended settings are likely to changed with new framework releases.

        "},{"location":"aurora/data-science/frameworks/tensorflow/#distributed-training","title":"Distributed Training","text":"

        Distributed training with TensorFlow on Aurora is facilitated through Horovod, using Intel Optimization for Horovod.

        The key steps in performing distributed training are laid out in the following example:

        • Tensorflow examples with Intel Optimization for Horovod

        Detailed implementation of the same example is here:

        • TensorFlow with Keras and Horovod

        A suite of detailed and well documented examples is part of Intel's optimization for Horovod repository:

        • Distributed Training Example Suite
        "},{"location":"aurora/data-science/frameworks/tensorflow/#a-simple-job-script","title":"A simple Job Script","text":"

        Below we give a simple job script:

        #!/bin/bash -l\n#PBS -l select=512                              # selecting 512 Nodes\n#PBS -l place=scatter\n#PBS -l walltime=1:59:00\n#PBS -q EarlyAppAccess                          # a specific queue\n#PBS -A Aurora_deployment                       # project allocation\n#PBS -l filesystems=home                        # specific filesystem, can be a list separated by :\n#PBS -k doe\n#PBS -e /home/$USER/path/to/errordir\n#PBS -o /home/$USER/path/to/outdir              # path to `stdout` or `.OU` files\n#PBS -j oe                                      # output and error placed in the `stdout` file\n#PBS -N a.name.for.the.job\n\n#####################################################################\n# This block configures the total number of ranks, discovering\n# it from PBS variables.\n# 12 Ranks per node, if doing rank/tile\n#####################################################################\n\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=12\nlet NRANKS=${NNODES}*${NRANKS_PER_NODE}\n\n# This is a fix for running over 16 nodes:\nexport FI_CXI_DEFAULT_CQ_SIZE=131072\nexport FI_CXI_OVFLOW_BUF_SIZE=8388608\nexport FI_CXI_CQ_FILL_PERCENT=20\n# These are workaround for a known Cassini overflow issue\n\nexport FI_LOG_LEVEL=warn\n#export FI_LOG_PROV=tcp\nexport FI_LOG_PROV=cxi\n# These allow for logging from a specific provider (libfabric)\n\nexport MPIR_CVAR_ENABLE_GPU=0\nexport CCL_KVS_GET_TIMEOUT=600\n\n#####################################################################\n# FRAMEWORK Variables that make a performance difference\n#####################################################################\n\n# Toggle tf32 on (or don't):\nexport ITEX_FP32_MATH_MODE=TF32\n\n#####################################################################\n# End of perf-adjustment section\n#####################################################################\n\n#####################################################################\n# Environment set up, using the latest frameworks drop\n#####################################################################\n\nmodule use /soft/modulefiles\nmodule load frameworks/2023.12.15.001\n\nexport NUMEXPR_MAX_THREADS=1\n# This is to resolve an issue due to a package called \"numexpr\".\n# It sets the variable\n# 'numexpr.nthreads' to available number of threads by default, in this case\n# to 208. However, the 'NUMEXPR_MAX_THREADS' is also set to 64 as a package\n# default. The solution is to either set the 'NUMEXPR_NUM_THREADS' to less than\n# or equal to '64' or to increase the 'NUMEXPR_MAX_THREADS' to the available\n# number of threads. Both of these variables can be set manually.\n\n#####################################################################\n# End of environment setup section\n#####################################################################\n\n#####################################################################\n# JOB LAUNCH\n######################################################################\n\nexport CCL_LOG_LEVEL=\"WARN\"\nexport CPU_BIND=\"verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96\"\nHOROVOD_THREAD_AFFINITY=\"4,12,20,28,36,44,56,64,72,80,88,96\"\nCCL_WORKER_AFFINITY=\"5,13,21,29,37,45,57,65,73,81,89,97\"\n\nulimit -c 0\n\n# Launch the script\nmpiexec -np ${NRANKS} -ppn ${NRANKS_PER_NODE} \\\n--cpu-bind ${CPU_BIND} \\\npython path/to/application.py\n
        "},{"location":"aurora/data-science/libraries/onednn/","title":"oneDNN","text":""},{"location":"aurora/data-science/libraries/openvino/","title":"Model Inference with OpenVINO","text":"

        OpenVINO is a library developed by Intel specifically designed for accelerating inference of ML models on their CPU and GPU hardware. This page contains build and run instructions for Python and C/C++ examples, but please refer to the full documentation for more information.

        "},{"location":"aurora/data-science/libraries/openvino/#instlling-openvino","title":"Instlling OpenVINO","text":"

        OpenVINO does not come with the default frameworks module on Aurora, but it can be installed manually within a virtual environment as shown below

        module use /soft/modulefiles\nmodule load frameworks/2023.10.15.001\npython -m venv --clear /path/to/_ov_env --system-site-packages\nsource /path/to/_ov_env/bin/activate\npip install openvino==2023.2\npip install openvino-dev==2023.2\npip install onnx\n

        Note that /path/to/ can either be a user's home or project directory.

        To use OpenVINO in the future, simply load the frameworks module and source the virtual environment.

        module use /soft/modulefiles\nmodule load frameworks/2023.10.15.001\nsource /path/to/_ov_env/bin/activate\n

        "},{"location":"aurora/data-science/libraries/openvino/#model-converter","title":"Model Converter","text":"

        The first suggested step is to convert the model from one of the ML frameworks into OpenVINO's Intermediate Representation (IR). This consists of an .xml file which describes the network topology and a .bin file which contains the weights and biases in binary format. The conversion can be done from the command line with ovc or using the Pyrhon API openvino.comvert_model(). Note that PyTorch models cannot be converted directly with ovc and need to be converted to ONNX format first. You can find more information on the conversion process on OpenVINO's documentation page.

        The following code snippet demonstrates how to convert the ResNet50 model from TorchVision and save the OpenVINO IR.

        import openvino as ov\nimport torch\nfrom torchvision.models import resnet50\n\nmodel = resnet50(weights='DEFAULT')\ninput_data = torch.rand(1, 3, 224, 224)\n\nov_model = ov.convert_model(model, example_input=input_data)\nov.save_model(ov_model, 'resnet50.xml')\n

        Information on using the CLI conversion tool can be found running ovc -h, which will save the model in IR format by default.

        Note that by default, both ovc and openvino.save_model() perform compression of the model weights to FP16. This reduces the memory needed to store the model and can provide an increase in performance in many cases. To disable this feature, use

        ov.save_model(ov_model, 'resnet50.xml', compress_to_fp16=False)\n

        or

        ovc </path/to/model.onnx> --compress_to_fp16=False\n
        "},{"location":"aurora/data-science/libraries/openvino/#benchmark-app","title":"Benchmark App","text":"

        Before writing a script or program to perform inference with the OpenVINO runtime, the performance of the model can be tested with the CLI tool benchmark_app.

        A minimal example to run on a single PVC tile is shown below

        benchmark_app -m resnet50.xml -hint latency -d GPU.0 -data_shape [1,3,224,224]\n

        which returns a series of information on the parameters set for the benchmark tests and the performance of the tests. The last few lines of the output are shown below.

        [ INFO ] Execution Devices:['OCL_GPU.0']\n[ INFO ] Count:            6424 iterations\n[ INFO ] Duration:         60011.14 ms\n[ INFO ] Latency:\n[ INFO ]    Median:        9.23 ms\n[ INFO ]    Average:       9.25 ms\n[ INFO ]    Min:           9.00 ms\n[ INFO ]    Max:           11.69 ms\n[ INFO ] Throughput:   107.05 FPS\n

        Note that benchmark_app takes a number of additional configuration options as described here and running benchmark_app -h.

        "},{"location":"aurora/data-science/libraries/openvino/#inference-with-python-openvino-api","title":"Inference with Python OpenVINO API","text":"

        Inference can be performed invoking the compiled model directly or using the OpenVINO Runtime API explicitly.

        An example of performing direct inference with the compiled model is shown below. This leads to compact code, but it performs a single synchronous inference request. Future calls to the model will reuse the same inference request created, thus will experience less overhead. Note that the output of the model is a numpy array.

        import openvino as ov\nimport torch\n\ncore = ov.Core()\ncompiled_model = core.compile_model(\"resnet50.xml\",device_name='GPU.0')\ninput_data = torch.rand((1, 3, 224, 224))\nresults = compiled_model(input_data)[0]\n

        The Runtime API can be called explicitly to have more control over the requests. For this approach we refer the user to the OpenVINO documentation page, which clearly outlines the steps involved.

        "},{"location":"aurora/data-science/libraries/openvino/#inference-with-c-openvino-api","title":"Inference with C++ OpenVINO API","text":"

        This feature is still under testing on Aurora.

        "},{"location":"aurora/debugging/debugging-overview/","title":"Debugging on Aurora","text":""},{"location":"aurora/debugging/debugging-overview/#hpe-gdb4hpc","title":"HPE gdb4hpc","text":"

        The gdb4hpc is not a GPU-aware debugger but can be used to debug general code problems at scale. This debugger will apply commands to all threads in the MPI process group.

        "},{"location":"aurora/debugging/debugging-overview/#attaching-to-a-running-job","title":"Attaching to a running job","text":"

        Determine the jobid of interest.

          qstat -u $USER\n
          harms@aurora-uan-0009:~/working/all2all> qstat -u $USER\n\n  aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov: \n                                                            Req'd  Req'd   Elap\n  Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time\n  --------------- -------- -------- ---------- ------ --- --- ------ ----- - -----\n  127750.aurora-* harms    workq    all2all       --    4   4    --  00:30 R   -- \n

        Next find a node the job is running on. Choose the first node in the list of vnodes.

          qstat -f 127750 | grep exec_vnode\n
          harms@aurora-uan-0009:~/working/all2all> qstat -f 127750 | grep exec_vnode\n      exec_vnode = (x4305c2s6b0n0:ncpus=1)+(x4305c2s7b0n0:ncpus=1)+(x4305c4s0b0n0\n

        Login to this node, find your mpiexec process id and run gdb4hpc.

          ssh x4305c2s6b0n0\n  ps -eaf | grep mpiexec\n  module load gdb4hpc\n  CTI_WLM_IMPL=ssh gdb4hpc\n
          harms@aurora-uan-0009:~/working/all2all> ssh x4305c2s6b0n0\n  harms@x4305c2s6b0n0:~> ps -eaf | grep mpiexec\n  harms    108581 108569  0 16:05 ?        00:00:00 mpiexec -l --no-transfer --line-buffer --np 16 -ppn 4 --cpu-bind core ./a2a-p2p\n  harms    109440 109354  0 16:11 pts/4    00:00:00 grep --color=auto mpiexec\n  harms@x4305c2s6b0n0:~> module load gdb4hpc\n  harms@x4305c2s6b0n0:~> CTI_WLM_IMPL=ssh gdb4hpc\n  gdb4hpc 4.14.7 - Cray Line Mode Parallel Debugger\n  With Cray Comparative Debugging Technology.\n  Copyright 2007-2022 Hewlett Packard Enterprise Development LP.\n  Copyright 1996-2016 University of Queensland. All Rights Reserved.\n\n  Type \"help\" for a list of commands.\n  Type \"help <cmd>\" for detailed help about a command.\n  dbg all>\n

        Now attach to the mpiexec process.

          dbg all> attach $a <pid>\n
          dbg all> attach $a 108581\n  0/16 ranks connected... (timeout in 299 seconds)\n  0/16 ranks connected... (timeout in 298 seconds)\n  ..\n  12/16 ranks connected... (timeout in 300 seconds)\n  16/16 ranks connected.\n  Created network...\n  Connected to application...\n  Current rank location:\n  a{0}: #0  0x00001472aba12699 in MPIDI_progress_test\n  ... backtrace ...\n
        "},{"location":"aurora/debugging/gdb-oneapi-aurora/","title":"gdb-oneapi on Aurora","text":"

        Placeholder

        "},{"location":"aurora/hardware-overview/machine-overview/","title":"Aurora System Overview","text":"

        Aurora is a 10,624 node HPE Cray-Ex based system. It has 166 racks with 21,248 CPUs and 63,744 GPUs. Each node consists of 2 Intel Xeon CPU Max Series (codename Sapphire Rapids or SPR) with on-package HBM and 6 Intel Data Center GPU Max Series (codename Ponte Vecchio or PVC). Each Xeon has 52 physical cores supporting 2 hardware threads per core and 64GB of HBM. Each CPU has 512 GB of DDR5. The GPUs are connected all-to-all with Intel XeLink interfaces. Each node has 8 HPE Slingshot-11 NICs, and the system is connected in a dragonfly topology. The GPUs may send messages directly to the NIC via PCIe, without the need to copy into CPU memory.

        Intel Data Center GPU Max Series is based on Xe Core. Each Xe core consist of 8 vector engines and 8 matrix engines with 512 kb of L1 cache that can be configured as cache or Shared Local Memroy (SLM). 16 Xe cores are grouped together to form a slice. 4 slicess are combined along with a large L2 cache, 4 HBM2E memory controllers to form s stack or tile. One or more stacks/tiles can then be combined on a socket to form a GPU. More detailed information about node architecture can be found here: https://www.intel.com/content/www/us/en/developer/articles/technical/intel-data-center-gpu-max-series-overview.html

        "},{"location":"aurora/hardware-overview/machine-overview/#aurora-compute-node","title":"Aurora Compute Node","text":"NODE COMPONENT DESCRIPTION PER NODE AGGREGATE Processor 2000 MHz 2 21,248 Cores/Threads Intel Xeon CPU Max 9470C Series 104/208 1,104,896/2,209,792 CPU HBM HBM2e 64x2 GiB 1.328 PiB CPU DRAM DDR5 512x2 GiB 10.375 PiB GPUS Intel Data Center Max 1550 Series 6 63,744 GPU HBM HBM2e 768 GiB 7.968 PiB"},{"location":"aurora/hardware-overview/machine-overview/#aurora-pvc-gpu-components","title":"Aurora PVC GPU Components","text":"GPU COMPONENT DESCRIPTION COUNT CAPABILITY Stack a.k.a. Tile 2 Xe Vector Engine a.k.a. EU (execution unit) 512 per Stack (4?? active) 8 threads, 512b SIMD Xe Matrix Engine a.k.a systolic part of EU 512 per Stack (4?? active) Register 512 bit register 128 per thread Xe Core a.k.a. subslice; unit of 8 EUs 64 per Stack 128 per GPU L1 cache 128 KiB Last Level cache a.k.a. RAMBO cache 384 MiB per GPU"},{"location":"aurora/performance-tools/advisor/","title":"Advisor","text":""},{"location":"aurora/performance-tools/performance-overview/","title":"Performance Tools Overview","text":""},{"location":"aurora/performance-tools/vtune/","title":"VTune","text":""},{"location":"aurora/programming-models/compatibility-tool/","title":"Compatibility Tool","text":"

        Placeholder

        "},{"location":"aurora/programming-models/heterogeneous-models/","title":"Heterogeneous Models and Porting Paths to Aurora","text":"

        Placeholder

        "},{"location":"aurora/programming-models/kokkos-aurora/","title":"Kokkos","text":""},{"location":"aurora/programming-models/kokkos-aurora/#kokkos","title":"Kokkos","text":"

        Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It currently can use Serial and OpenMP (threads) for CPU execution spaces (\"backends\") and CUDA, HIP, SYCL, and OpenMPTarget for GPU execution spaces. By convention, Kokkos only allows one GPU backend at a time.

        "},{"location":"aurora/programming-models/kokkos-aurora/#kokkos-documentation","title":"Kokkos Documentation","text":"
        • Kokkos-core Wiki
        • Kokkos github
        "},{"location":"aurora/programming-models/kokkos-aurora/#kokkos-on-aurora","title":"Kokkos on Aurora","text":"

        The prebuilt Kokkos on Aurora includes 3 backends: Serial and OpenMP for CPU execution and SYCL for GPU execution (with ahead-of-time (AOT) compilation, not just-in-time (JIT) compilation. To use it, run

        module use /soft/modulefiles\nmodule load kokkos\n
        This sets the following environment variables, some of which are used by cmake:

        • KOKKOS_HOME - path to the lib64/, include/ files installed
        • LIBRARY_PATH - prepends $KOKKOS_HOME/lib64 to this variable used by cmake
        • CPATH - prepends $KOKKOS_HOME/include to this variable used by cmake
        • LD_LIBRARY_PATH - prepends $KOKKOS_HOME/lib64 to this variable
        "},{"location":"aurora/programming-models/kokkos-aurora/#building-a-kokkos-application-using-cmake","title":"Building a Kokkos Application Using cmake","text":"

        Add these lines to CMakeLists.txt:

        find_package(Kokkos REQUIRED)\ntarget_link_libraries(myTarget Kokkos::kokkoscore)\n

        Here is a simple example CMakeLists.txt to compile an example program:

        cmake_minimum_required(VERSION 3.22)\nproject(buildExample)\nfind_package(Kokkos REQUIRED)\n\nset(buildExample_SOURCE_DIR \".\")\n\nset(top_SRCS\n  ${buildExample_SOURCE_DIR}/example1.cpp)\n\nset(SOURCE_FILES ${top_SRCS})\n\nadd_executable(example1_sycl_aot ${SOURCE_FILES})\ntarget_link_libraries(example1_sycl_aot Kokkos::kokkoscore)\ntarget_include_directories(example1_sycl_aot PUBLIC ${buildExample_SOURCE_DIR})\n

        Configure and build it like this:

        mkdir build\ncd build\ncmake -DCMAKE_CXX_COMPILER=CC -DCMAKE_C_COMPILER=cc ..\nmake\n
        "},{"location":"aurora/programming-models/kokkos-aurora/#building-a-kokkos-application-using-make","title":"Building a Kokkos Application Using make","text":"

        Here's an example Makefile:

        # KOKKOS_HOME set via:\n#   module load kokkos\n\n# You can look at the first lines of $KOKKOS_HOME/KokkosConfigCommon.cmake to\n# see the flags used in cmake configuration of the kokkos library build. The\n# default Kokkos module on Aurora was built with PrgEnv-nvhpc and includes\n# Serial, OpenMP (threads) and CUDA backends. So you should have that\n# environment module loaded and include compiler flags for cuda and openmp:\n\n# Aurora MPICH wrapper for C++ and C compilers:\nCXX=\"mpic++ -cxx=icpx\"\nCC=\"mpicc -cc=icx\"\n\nSYCL_AOT_CPPFLAGS=-fsycl -fsycl-targets=spir64_gen -fno-sycl-id-queries-fit-in-int -fsycl-dead-args-optimization -fsycl-unnamed-lambda -std=c++17\nSYCL_AOT_LDFLAGS=-Xsycl-target-backend \"-device 12.60.7\"\n\nCPPFLAGS=-g -O2 -fiopenmp -I $(KOKKOS_HOME)/include $(SYCL_AOT_CPPFLAGS) -Wno-deprecated-declarations -Wno-tautological-constant-compare -Wno-unknown-attributes\n\nLDFLAGS=$(CPPFLAGS) $(SYCL_AOT_LDFLAGS)\nLDLIBS=-L$(KOKKOS_HOME)/lib64 -lkokkoscore -lkokkossimd -lpthread\n\nSRCS=example1.cpp\nOBJS=$(subst .cpp,.o,$(SRCS))\n\nall: example1_aurora\n\nexample1_aurora: $(OBJS)\n        $(CXX) $(LDFLAGS) -o example1_aurora $(OBJS) $(LDLIBS)\n\nexample1.o: example1.cpp\n\nclean:\n        rm -f $(OBJS)\n\ndistclean: clean\n        rm -f example1_aurora\n
        "},{"location":"aurora/programming-models/kokkos-aurora/#configuring-your-own-kokkos-build-on-aurora","title":"Configuring Your Own Kokkos Build on Aurora","text":"

        Here are recommended environment settings and configuration to build your own kokkos libraries on Aurora:

        "},{"location":"aurora/programming-models/kokkos-aurora/#environment","title":"Environment","text":"

        To match what was done in the centrally-built kokkos associated with the modules discussed above, use the same oneAPI version as indicated in module help kokkos and use the Aurora MPICH wrapper mpic++ -cxx=icpx as the C++ compiler (or just use icpx if you're not using MPI). To build Kokkos, you'll need cmake.

        "},{"location":"aurora/programming-models/kokkos-aurora/#cmake-configuration","title":"CMake Configuration","text":"

        This example builds three backends: OpenMP, Serial, and SYCL.

        git clone git@github.com:kokkos/kokkos.git\ncd kokkos\nmkdir build\ncd build\n\ncmake\\\n    -DCMAKE_BUILD_TYPE=RelWithDebInfo\\\n    -DCMAKE_CXX_COMPILER=icpx\\\n    -DCMAKE_CXX_EXTENSIONS=OFF\\\n    -DCMAKE_CXX_STANDARD=17\\\n    -DKokkos_ENABLE_TESTS=OFF\\\n    -DKokkos_ENABLE_SERIAL=ON\\\n    -DKokkos_ENABLE_OPENMP=ON\\\n    -DKokkos_ENABLE_SYCL=ON\\\n    -DKokkos_ARCH_INTEL_PVC=ON\\\n    -DBUILD_SHARED_LIBS=OFF\\\n    -DKokkos_ENABLE_DEPRECATED_CODE_3=ON\\\n    -DKokkos_ENABLE_DEBUG=OFF\\\n    -DKokkos_ENABLE_EXAMPLES=OFF\\\n    -DCMAKE_CXX_FLAGS=\"-Wno-deprecated-declarations -Wno-tautological-constant-compare\"\\\n    -DCMAKE_EXE_LINKER_FLAGS=\"-fsycl-max-parallel-link-jobs=5\"\\\n    -DCMAKE_VERBOSE_MAKEFILE=OFF\\\n    -DCMAKE_INSTALL_PREFIX=/path/to/your/install/directory\\\n    ..\n\nmake -j16 -l16 install\n
        "},{"location":"aurora/programming-models/level-0/","title":"Level 0","text":"

        Placeholder

        "},{"location":"aurora/programming-models/one-api/","title":"oneAPI","text":"

        Placeholder

        "},{"location":"aurora/programming-models/opencl-aurora/","title":"OpenCL","text":"

        Placeholder

        "},{"location":"aurora/programming-models/openmp-aurora/","title":"OpenMP on Aurora","text":""},{"location":"aurora/programming-models/raja-aurora/","title":"Raja","text":"

        Placeholder

        "},{"location":"aurora/programming-models/sycl-aurora/","title":"SYCL","text":"

        SYCL on Aurora

        "},{"location":"aurora/services/gitlab-ci/","title":"Continuous Integration via Gitlab-CI For Sunspot/Aurora","text":""},{"location":"aurora/services/gitlab-ci/#changes-from-the-general-documentation-needed-for-aurorasunspot","title":"Changes from the general documentation needed for Aurora/Sunspot:","text":"

        Instead of gitlab-ci.alcf.anl.gov use gitlab-sunspot.alcf.anl.gov.

        "},{"location":"aurora/services/gitlab-ci/#alcf-specific-variables-for-aurora-and-sunspot","title":"ALCF Specific Variables for Aurora and Sunspot:","text":"Cluster Scheduler Variable Name Support docs Aurora PBS ANL_AURORA_SCHEDULER_PARAMETERS Aurora Getting Started Sunspot PBS ANL_SUNSPOT_SCHEDULER_PARAMETERS Sunspot Getting Started"},{"location":"aurora/services/gitlab-ci/#examples-which-have-been-modified-for-aurora-and-sunspot","title":"Examples which have been modified for Aurora and Sunspot:","text":"

        Example: A .gitlab-ci.yml file for an Aurora project

        variables:\n  ANL_AURORA_SCHEDULER_PARAMETERS: \"-A ProjectName -l walltime=0:30:00  -q AuroraQueueName\"\nstages:\n  - stage1\n  - stage2\nshell_test1:\n  stage: stage1\n  tags:\n    - shell\n    - aurora\n  script:\n    - echo \"Shell Job 1\"\nbatch_test:\n  stage: stage2\n  tags:\n    - batch\n    - aurora\n  script:\n    - echo \"Job 2 start\"\n    - echo \"Job end\"\n

        Example: Running a batch job on the Aurora HPC

        variables:\n ANL_AURORA_SCHEDULER_PARAMETERS: \"-A ProjectName -l walltime=0:30:00  -q AuroraQueueName\"\n\nbatch_test:\n  tags:\n    - aurora\n    - batch\n  script:\n    - echo \"Job start\"\n    - echo \"Job end\"\n

        Example: Aurora pipeline with custom stages

        variables:\n ANL_AURORA_SCHEDULER_PARAMETERS: \"-A ProjectName -l walltime=0:30:00  -q AuroraQueueName\"\n\nstages:\n  - stage1\n  - stage2\n\ntest1:\n  stage: stage1\n  tags:\n    - aurora\n    - shell\n  script:\n    - export\n    - id\n    - hostname\n    - echo \"Running on aurora with shell runner\" \n    - echo test > test.txt\ntest2:\n  stage: stage2\n  tags:\n    - aurora\n    - batch\n  script:\n    - echo \"Job 2 start\"\n    - echo \"Job 2 end\"\n

        Example: Gitlab job designed to only run on merge requests

        test1:\n  rules:\n    - if: $CI_COMMIT_TAG                    # Do not execute jobs for tag context\n      when: never\n    - if: $CI_COMMIT_BRANCH == \"master\"     # Do not run on master, since will run on the merge request just prior\n      when: never\n    - if: $CI_MERGE_REQUEST_IID             # CI_MERGE_REQUEST_IID exists, so run job\n  stage: stage1\n  tags:\n    - aurora\n    - shell\n  script:\n    - echo \"Run test 1\"\n

        "},{"location":"aurora/services/jupyterhub/","title":"JupyterHub","text":"

        Placeholder

        "},{"location":"aurora/visualization/paraview/","title":"Paraview on Aurora","text":""},{"location":"aurora/workflows/balsam/","title":"Balsam on Aurora","text":""},{"location":"aurora/workflows/deephyper/","title":"DeepHyper","text":""},{"location":"aurora/workflows/libensemble/","title":"libEnsemble","text":"

        libEnsemble on Aurora

        "},{"location":"aurora/workflows/parsl/","title":"Parsl on Aurora","text":""},{"location":"aurora/workflows/smartsim/","title":"SmartSim and SmartRedis","text":"

        SmartSim is an open source tool developed by the Hewlett Packard Enterprise (HPE) designed to facilitate the integration of traditional HPC simulation applications with machine learning workflows. There are two core components to SmartSim:

        • Infrastructure library (IL)
          • Provides API to start, stop and monitor HPC applications from Python
          • Interfaces with the scheduler launch jobs (PBSPro on Polaris and Cobalt on Theta/ThetaGPU)
          • Deploys a distributed in-memory database called the Orchestrator
        • SmartRedis client library
          • Provides clients that connect to the Orchestrator from Fortran, C, C++, Python code
          • The client API library enables data transfer to/from database and ability to load and run JIT-traced Python and ML runtimes acting on stored data

        For more resources on SmartSim, follow the links below:

        • Source code
        • Documentation
        • Zoo of examples
        • Fall 2023 ALCF User Hands-On Workshop
        • NekRS-ML
        "},{"location":"aurora/workflows/smartsim/#installation","title":"Installation","text":"

        SmartSim on Aurora can be installed creating a virtual environment based on the ML frameworks module

        module use /soft/modulefiles\nmodule load frameworks/2023.10.15.001\npython -m venv --clear /path/to/_ssim_env --system-site-packages\nsource /path/to/_ssim_env/bin/activate\npip install --upgrade pip\n
        Note that /path/to/ can either be a user's home or project directory.

        To use SmartSim in the future, simply load the frameworks module and source the virtual environment.

        module use /soft/modulefiles\nmodule load frameworks/2023.10.15.001\nsource /path/to/_ssim_env/bin/activate\n

        Then install SmartSim and the CPU backend

        export SMARTSIM_REDISAI=1.2.7\ngit clone https://github.com/CrayLabs/SmartSim.git\ncd SmartSim\npip install -e .\nTORCH_PATH=$( python -c 'import torch;print(torch.utils.cmake_prefix_path)' )\nTF_PATH=$( python -c 'import tensorflow;print(\"/\".join(tensorflow.__file__.split(\"/\")[:-1]))' )\nsmart build -v --device cpu --torch_dir $TORCH_PATH --libtensorflow_dir $TF_PATH\ncd ..\n

        Note:

        • The pip install -e . command returns some warnings regarding the version of protobuf and errors about the installation of cloud-volume, but these can be ignored for now.
        • The smart build -v --device cpu command builds the RedisAI backend for the CPU. This enables ML model inferencing with the SmartRedis library on the CPU hardware with models stored within the database. This feature is not enabled on the Intel Max 1550 GPU.

        Finally, install the SmartRedis library

        git clone https://github.com/CrayLabs/SmartRedis.git\ncd SmartRedis\npip install -e .\nmake lib\ncd ..\n

        "},{"location":"cooley/cooley-overview/","title":"Cooley System Overview","text":"

        The primary purpose of Cooley is to analyze and visualize data produced on ALCF supercomputers. Equipped with state-of-the-art graphics processing units (GPUs), Cooley converts computational data into high-resolution visual representations. The resulting images, videos, and animations help users to better analyze and understand the data generated by ALCF supercomputers. Cooley can also be used for statistical analysis, helping to pinpoint trends in the simulation data. Additionally, the system is capable of preprocessing efforts, such as meshing, to assist users preparing for production simulations.

        Cooley mounts to the Theta file system enabling direct access to Theta-generated results. Theta's project directory can be accessed in /lus/theta-fs0/projects.

        Cooley has a total of 126 compute nodes; each node has 12 CPU cores and one NVIDIA Tesla K80 dual-GPU card. Aggregate GPU peak performance is over 293 teraflops double precision (using base GPU clocks), and the entire system has a total of 47 terabytes of system RAM and 3 terabytes of GPU RAM. Access to Cooley is provided by two login nodes, which provide compilation and job submission capabilities. Job scheduling is provided by the Cobalt job scheduler. All Theta users are approved for a default allocation of 8000 node-hours on Cooley.

        "},{"location":"cooley/cooley-overview/#cooley-node-configuration","title":"Cooley Node Configuration","text":"Specifications Architecture Intel Haswell Speed 293 teraflops Processors per node Two 6-core, 2.4-GHz Intel E5\u20132620 GPU per node 1 NVIDIA Tesla K80 w/dual GPUs Nodes 126 Cores 1,512 Memory 47 TB GPU memory 3 TB Interconnect FDR InfiniBand network Racks 6"},{"location":"cooley/compiling-and-linking/compiling-and-linking/","title":"Compiling and Linking on Cooley","text":""},{"location":"cooley/compiling-and-linking/compiling-and-linking/#compilers-and-mpi","title":"Compilers and MPI","text":"

        GNU compilers, version 4.4.7, are installed and are available in your default environment. To use version 4.8.1 of the GNU compilers instead, add +gcc-4.8.1 to .soft.cooley in your home directory.

        Intel Composer XE compilers (C/C++ and FORTRAN) are installed in /soft/compilers. To use the most current installed version, add +intel-composer-xe to .soft.cooley in your home directory. Specific versioned keys (such as +intel-composer-xe-2013 and +intel-composer-xe-2015) are also available if you require a previous version. Our installation of the Intel compilers include the Intel Math Kernel Library (MKL).

        The Clang compiler is installed in /soft/compilers/llvm. To use the most recent version, add @clang to .soft.cooley in your home directory. For more information on using the Clang compiler, please see: http://clang.llvm.org/docs/UsersManual.html.

        Multiple MPI versions are available, controlled by the .soft.cooley file in your home directory. We currently provide both MPICH2 and MVAPICH2 for use with the GNU (+mvapich2), Clang (@mvapich2-clang) and Intel (+mvapich2-intel) compilers. See the output of the \"softenv\" command for the most current information.

        For example, you would put the following in your .soft.cooley file to add the GNU version of MVAPICH2 to your environment:

        +mvapich2\n@default\n
        By default, we provide all new users with the +mvapich2 key. MVAPICH2 is designed to operate efficiently over our Infiniband interconnect, and is therefore preferred over MPICH2.

        Compiler wrappers (mpicc, mpicxx, mpif90, and so on) are included in your path for any MPI softenv key you use; each one of those softenv keys corresponds to a separate build of mvapich2/mpich2. So, no more than one +mpich2* or +mvapich2* key in your .soft.cooley is necessary.

        Also avoid explicitly providing MPI paths in link flags, since those could override what the wrappers provide. If you're using the wrappers, it is not necessary to explicitly specify any of the MPICH-related libraries.

        After you've updated your .soft.cooley file in your home directory run the command \"resoft\" for the changes to take effect.

        "},{"location":"cooley/job-submission/job-and-queue-scheduling/","title":"Queueing and Job Submission on Cooley","text":""},{"location":"cooley/job-submission/job-and-queue-scheduling/#overview","title":"Overview","text":"

        Like our other computing resources, Cooley uses the Cobalt job scheduler.

        Use the \"qstat\" command to see what jobs are in the queue, and the nodelist command to see which of the compute nodes (cc001-cc126) are free. The showres command will display any special reservations in place.

        Use the \"qsub\" command to submit jobs; what you submit should be a script which will be executed on the rank0 node allocated to you by the scheduler. This script will have access to an environment variable named COBALT_NODEFILE, which is the name of a file suitable for use with mpirun's -f option. One important thing to note is that the --proccount option to qsub has no effect on Cooley \u2013 the number of MPI processes run by your job is entirely dependent on the arguments you supply to mpirun in your script.

        At a minimum, qsub must be supplied with the number of nodes desired (-n), the walltime of the job (-t), and the path to the job script. If you are associated with more than one project, you will also need to supply the project name using the -A option (or, alternately, set the PROJECT environment variable to that project name).

        Note: If you modify or replace your .bashrc, you will need to retain the following lines from the default .bashrc in order to ensure that your Cobalt batch jobs receive a complete software environment:

        #  Source global definitions\nif [ -f /etc/bashrc ]; then\n        . /etc/bashrc\nfi\n

        Some important notes regarding the job script: - The job script runs only on one node of your job (designated as the head node); it is up to your script to distribute processes to the other nodes in a multi-node job. For a normal MPI job, this is accomplished by including -f $COBALT_NODEFILE in the arguments to mpirun/mpiexec. The COBALT_NODEFILE environment variable expands to the location of the nodefile (on the head node), which is a file containing the list of nodes Cobalt has assigned to your job. - Your job script must have an interpreter at the top (for example, #!/bin/bash) in order for Cobalt to recognize it as a valid script. - The job script will run using your default environment as set up by .bash_profile, .bashrc, and .soft.cooley at the time the job runs. It does not inherit the environment the qsub command was run in. If you need environment changes specific to this job, they must be set explicitly within the job script.

        For example, you might have a script named test.sh, which runs mpirun with 12 processes per node (one process per core):

        #!/bin/sh\nNODES=`cat $COBALT_NODEFILE | wc -l`\nPROCS=$((NODES * 12))\nmpirun -f $COBALT_NODEFILE -n $PROCS /home/lueningh/cpi/cpi-x86_64\n
        Note: The specific mpirun used depends on your software environment; you'll need to use a softenv key to specify which MPICH version you want to use. By default, we give all new users +mvapich2 to use the most recent version of MVAPICH2, but if desired, a different MPI may be selected from several available. For more information on softenv, see the softenv-intro man page.

        To request 5 nodes from cobalt with 10 minutes of walltime, charging to the MyProject project, you would use the command:

        qsub -n 5 -t 10 -A MyProject ./test.sh\n
        Cobalt will produce some files in the same directory where you ran qsub (unless of course you tell it to use a different working directory). By default, they are named .error, .output, and .cobaltlog. The .error output is stderr from your script, .output is stdout from your script, and the .cobaltlog file is some cobalt specific information like what your qsub submission contained, and how cobalt tried to invoke your executable.

        There is also an \"interactive\" mode. \"qsub -I\" will submit an interactive job to cobalt.

        Use it like this:

        qsub -I -n 1 -t 30\n

        It will block until your job runs, print out the list of nodes allocated to you, and then ssh to your rank0 node. When you log out (or your requested walltime expires), your job will be removed from the queue and the nodes will be released.

        Note: The old 'qsubi' method of running interactive jobs is deprecated and no longer available on Cooley -- please use 'qsub -I' instead.

        "},{"location":"cooley/job-submission/job-and-queue-scheduling/#specifying-filesystems","title":"Specifying Filesystems","text":"

        On systems running Cobalt at the ALCF your job submission should specify which filesystems your will be using. In the event that a filesystem becomes unavailable, this information is used to preserve jobs that would use that filesystem while allowing other jobs that are not using an affected filesystem to proceed to run normally.

        You may specify your filesystem by adding filesystems=<list of filesystems> to the --attrs argument of qsub in Cobalt. Valid filesystems are home, eagle, grand, and theta-fs0. The list is comma-delimited.

        For example, to request the home and eagle filesystems for your job you would add filesystems=home,eagle to your qsub command. If this is not specified a warning will be printed and then the job will be tagged as requesting all filesystems and may be held unnecessarily if a filesystem is not currently available. The warnings are written to stderr of qsub and qalter commands that change the value of the --attrs flag. Scripts that are parsing stderr from these utilities may encounter errors from the additional warnings if filesystems are not specified in these commands.

        If a job is submitted while a filesystem it requested is marked down, the job will automatically be placed into a user_hold and a warning message will be printed, but the job will be otherwise queued. The job is also placed into admin_hold by a sysadmin script. Once the affected filesystem has been returned to normal operation, the admin_hold is released. You are responsible for releasing the user_hold once you receive the message that the affected filesystem has been returned to normal operation. The job cannot run until both the holds are released.

        If a job requesting a filesystem that is marked down is already in the queue, it will be placed on admin_hold and will be released once the filesystem is operational.

        qsub -n 128 -t 30 -q default --attrs filesystems=home,grand -A Project ./my_job.sh\n
        To update the filesystems list for your job, use qalter. Note that qalter --attrs is a replace and not an update operation. This means that you should once again specify all the attributes that you had in the original qsub command.
        qalter --attr filesystems=home,eagle:mcdram=cache:numa=quad <jobid>\n
        To release user hold:
        qrls <jobid>\n

        "},{"location":"cooley/job-submission/job-and-queue-scheduling/#job-scheduling-on-cooley","title":"Job Scheduling on Cooley","text":"

        There are two primary queues on Cooley, default and debug

        "},{"location":"cooley/job-submission/job-and-queue-scheduling/#default-queue","title":"Default Queue","text":"

        The default queue is for production use, and is the default queue for jobs that are submitted without a queue specified. It has the following characteristics:

        • Max. runtime: 12 hours
        • Max. job size: 110 nodes (the other sixteen nodes are dedicated to debugging)
        • Max. running jobs per user: 10
        • Max. running and queued jobs per user: 30
        • Max. node-hours (queued and running): 1320
        • Priority: FIFO -- (jobs are run in order, with small, short jobs run on any otherwise-free nodes)
        "},{"location":"cooley/job-submission/job-and-queue-scheduling/#debug-queue","title":"Debug Queue","text":"

        In addition, there are sixteen nodes set aside for dedicated debugging in the debug queue. This is intended for short debugging and interactive visualization runs only. It has the following scheduling policy:

        • Max. runtime: 2 hour
        • Max. job size: 16 nodes
        • Max. running jobs per user: 1
        • Priority: FIFO -- (jobs are run in order, with small, short jobs run on any otherwise-free nodes)
        "},{"location":"cooley/job-submission/job-and-queue-scheduling/#public-network-connectivity","title":"Public Network Connectivity","text":"

        For jobs that require public network connectivity on the compute nodes (i.e. connectivity to non-ALCF resources), you may include the argument --attrs=pubnet in your qsub command .

        "},{"location":"cooley/job-submission/job-and-queue-scheduling/#gpus-directly-for-computation","title":"GPUs Directly for Computation","text":"

        For jobs that use the GPUs directly for computation (e.g. CUDA) and don't require an X sever, you may wish to include the argument --attrs=nox11 in your qsub command. This will stop the X server that normally runs on the nodes in order to prevent any performance impact on your GPU jobs.

        If required, the above job attributes may be combined as a colon-separated list, e.g. --attrs=pubnet:nox11

        While we currently continue to maintain special-purpose queues for the above functions (the queues named pubnet, pubnet-debug, nox11, and pubnet-nox11) in order to maintain compatibility for submission scripts that use them, these queues have been deprecated in favor of using the above job attributes, which provide more flexbility and can be used within reservations.

        If you have needs not addressed by the standard queues, please send mail to support@alcf.anl.gov requesting a reservation.

        We will monitor Cooley's queues and evaluate the above policies as needed. Your feedback is appreciated.

        "},{"location":"cooley/performance-tools/darshan/","title":"Darshan on Cooley","text":""},{"location":"cooley/performance-tools/darshan/#overview","title":"Overview","text":"

        Darshan is a lightweight I/O instrumentation library that can be used to investigate the I/O behavior of production applications. It records statistics, such as the number of files opened, time spent performing I/O, and the amount of data accessed by an application.

        "},{"location":"cooley/performance-tools/darshan/#enabling-darshan-on-cooley","title":"Enabling Darshan on Cooley","text":"

        Darshan is not automatically enabled for all jobs on Cooley. Unlike Theta, all applications on Cooley are dynamically linked by default, which means that Darshan must be loaded at runtime using the LD_PRELOAD environment variable. In order to instrument a job on Cooley, you must first add the SoftEnv key +darshan to your ~/.soft.cooley file and run the \u201cresoft\u201d command.

        Then add the following to the mpirun command line in your job script:

         --env LD_PRELOAD=$DARSHAN_PRELOAD\n ```\n\nExample:\n
        within Cooley job script mpirun --env LD_PRELOAD=$DARSHAN_PRELOAD -np -f $COBALT_NODEFILE ./app.exe
        After your job completes, you can find the Darshan output file in the following directory:\n
        /lus/theta-fs0/logs/darshan/cooley/// ```

        The same tools described in the Theta documentation can be used to interpret Darshan output files generated on Cooley.

        "},{"location":"cooley/performance-tools/remote-visualization-using-vnc/","title":"Remote Visualization on Cooley Using VNC","text":"

        When running graphics applications on Cooley, it is best to use the client/server mode when available.

        A lightweight client can run on your local resource and connect to a server application running on the Cooley visualization nodes. For applications that do not support a client/server mode, VNC can be used for remotely accessing such applications running on Cooley, and leveraging its GPUs.

        "},{"location":"cooley/performance-tools/remote-visualization-using-vnc/#setup-on-cooley","title":"Setup on Cooley","text":"

        On cooley.alcf.anl.gov, if you do not have a ~/.vnc/xstartup file, create one like the following:

        #!/bin/sh\nxterm &\ntwm\n

        Be sure to make it executable:

        > chmod u+x ~/.vnc/xstartup\n

        Also, create a VNC password, which you will need to provide each time you connect a remote VNC client to a VNC server running on Cooley:

        > vncpasswd\n

        This will store an obfuscated version in ~/.vnc/passwd

        "},{"location":"cooley/performance-tools/remote-visualization-using-vnc/#start-a-vnc-server-on-cooley","title":"Start a VNC server on Cooley","text":"

        Since we want the VNC server to run on a backend node, in order to leverage the GPU, we need to submit a job:

        > qsub -I -n 1 -t <time> -A <projectID>\n

        Once your job starts, you will be logged into a visualization node, where you can launch a VNC server:

        > x0vncserver --display=:0.0 --NeverShared=1 --geometry=2400x1500+0+0 --PasswordFile=/home/<username>/.vnc/passwd --MaxProcessorUsage=100\n

        Note: Take note of the hostname where your job is running (in the form cc###). You will need this in the next steps.

        • We use x0vncserver so that we can leverage the existing X server running on the node, which uses the GPU.
        • We specify --display=:0.0 to tell it which display to use.
        • Because the existing display has a resolution of 4096x4096, we use the --geometry flag to specify a region of that display to use. This should be set this to a size appropriate for displaying on your local display. You may also wish to adjust the +0+0 to adjust the portion of the display that is visible.
        • Replace with your login name in the path to your VNC PasswordFile.
        • Since we will have exclusive use of the node, we set the --MaxProcessorUsage=100 (otherwise the default is 35).
        • "},{"location":"cooley/performance-tools/remote-visualization-using-vnc/#on-your-local-resource","title":"On Your Local Resource","text":"

          From a shell on your local resource, establish an ssh tunnel through the Cooley login node to the backend node where you started the VNC server (the cc### noted above.) This will require the use of your OTP token.

          > ssh -L 5900:cc###:5900 <username>@cooley.alcf.anl.gov\n

          Once the ssh connection is established, from this shell launch the xstartup script on your visualization node.

          If your default shell is bash, use the following command (this will block, and not return you to a command prompt):

          ssh cc### \u201cexport DISPLAY=:0.0; ~/.vnc/xstartup\u201d\n

          If your default shell is csh/tcsh, use the following command (this will block, and not return you to a command prompt):

          ssh cc### \u201csetenv DISPLAY :0.0; ~/.vnc/xstartup\u201d\n

          Now in a start a vnc viewer on your local resource, for example:

          > xvncviewer localhost::5900\n

          Notes: - Since we are tunneling, set the host to localhost. - Syntax for VNC clients may vary. Check the documentation for your specific client to determine appropriate syntax for specifying the host and port.

          This should open a VNC viewer with an xterm running in it, where you can launch graphics applications running on the Cooley backend node, and taking advantage of the GPU.

          Additional note: Because you are likely not using the full 4096x4096 area of the display, it is possible that some applications that automatically place their windows may place them outside of the region that you are viewing. Some application may provide a mechanism for placing the window at a specific location. Otherwise, you may need to adjust the +0+0 portion of the --geometry flag when running the x0vncserver executable to adjust the portion of the display that is visible.

          "},{"location":"cooley/performance-tools/remote-visualization-using-vnc/#cleaning-up","title":"Cleaning Up","text":"

          When you are all done, be sure to clean up:

          • Exit the VNC viewer
          • Kill the VNC server (cntrl C), and exit the cc### shell
          • You may need to cntrl C to exit the ssh command in the shell used to create the tunnel
          • Then exit that shell to close the tunnel
          "},{"location":"cooley/programming-models/kokkos/","title":"Kokkos on Cooley","text":""},{"location":"cooley/programming-models/kokkos/#overview","title":"Overview","text":"

          Kokkos implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. It provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It can use OpenMP, etc as backend programming model. For more information please visit https://github.com/kokkos/kokkos

          The Kokkos shared memory programming model is a C++ library, that provides the necessary architecture specific backends (e.g. OpenMP, CUDA, \u2026). To begin with, though, it is important to note that the Kokkos programming model is usable only in C/C++ codes. Hence, for those with Fortran codes, Kokkos must first be encapsulated within C/C++ functions and called from the main Fortran code.

          The purpose of this document is to provide guidance on using Kokkos on Cooley. Please see the following pages for tutorial and more information on Kokkos: Kokkos GitHub and Kokkos Tutorials.

          "},{"location":"cooley/programming-models/kokkos/#using-kokkos-at-alcf","title":"Using Kokkos at ALCF","text":"

          ALCFprovides assistance with build instructions, compiling executables, submitting jobs, and providing prebuilt binaries. For questions, contact us at support@alcf.anl.gov.

          "},{"location":"cooley/programming-models/kokkos/#building-kokkos-on-cooley","title":"Building Kokkos on Cooley","text":"

          Users should look at the /soft/ProgrammingModels/KOKKOS-JULY-29-2019/ directory. The project directory consists of the following directories that users should feel free to copy over to their own directories.

          [bramesh@cooleylogin2 KOKKOS-JULY-29-2019]$ ls\nkokkos/  kokkos-tools/  kokkos-tutorials/\n[bramesh@cooleylogin2 KOKKOS-JULY-29-2019]$\n

          In order to compile and use Kokkos on Cooley, users will need the following settings in the .soft.cooley file:

          +mvapich2-2.2\n+cuda-10.0\n+gcc-7.1.0\n@default\n

          The idea behind these settings is to use the CUDA-10 wrapper, \u201cnvcc,\u201d together with the gcc-7.1.0 compiler, and the mvapich 2-2.2 build with the gcc-7.1.0 compiler. In order to compile/run Kokkos, users are encouraged to look at the tutorial example(s) in the following directories:

          /soft/ProgrammingModels/KOKKOS-JULY-29-2019/kokkos-tutorials/Intro-Full/Exercises/<01,02, \u2026>/Begin\n

          The Makefile(s), and the settings therein, should indicate the specifications for the KOKKOS_DEVICE variable that one would need for either the OpenMP, or CUDA, or OpenMP+CUDA backends in Kokkos.

          "},{"location":"cooley/programming-models/opencl/","title":"OpenCL on Cooley","text":""},{"location":"cooley/programming-models/opencl/#overview","title":"Overview","text":"

          OpenCL (Open Computing Language) is an open standard from the Khronos group that enables heterogeneous device programming (e.g. CPUS, GPUs, and FPGAs). Complete descriptions of the API, memory hierarchy, and usage can be found in OpenCL documentation and technical specifications found on the Khronos website.

          "},{"location":"cooley/programming-models/opencl/#using-opencl-at-alcf","title":"Using OpenCL at ALCF","text":"

          OpenCL is provided on Cooley via the NVIDIA GPU drivers for the Kepler K80 GPUs. OpenCL 1.2 is supported on Cooley. There is not an OpenCL CPU driver available on Cooley to launch OpenCL kernels on the CPUs.

          "},{"location":"cooley/programming-models/opencl/#building-on-cooley","title":"Building on Cooley","text":"

          To build an OpenCL application for the GPUs on Cooley, one can modify their software environment to load CUDA.

          soft add +cuda-10.0\nsoft add +gcc-7.1.0\nexport CPATH=/soft/visualization/cuda-10.0/include:$CPATH\nexport LIBRARY_PATH=/soft/visualization/cuda-10.0/include:$LIBRARY_PATH\n

          The following is an example to compile a simple OpenCL application.

          mpicxx main.cpp -lOpenCL \n

          "},{"location":"cooley/programming-models/opencl/#running-jobs-on-cooley","title":"Running jobs on Cooley","text":"

          An example \u2018test.sh\u2019 job submission script follows.

          #!/bin/sh\nNODES=`cat $COBALT_NODEFILE | wc -l`\nPROCS=$((NODES * 1))\nmpirun -f $COBALT_NODEFILE -n $PROCS ./a.out\n

          To request a single node with 10 minutes of walltime, charging to the MyProject project, one can use the following command.

          qsub -n 1 -t 10 -A MyProject ./test.sh\n

          "},{"location":"cooley/programming-models/opencl/#alcf-tutorials","title":"ALCF Tutorials","text":"

          There is an OpenCL tutorial available from the ALCF on GitHub that provides a walk through on several concepts of the programming model: introduction to API, querying device info, compiling kernels, using buffers, profiling, etc\u2026 The repo can be cloned to your working directory with one of the following commands.

          git clone https://github.com/alcf-perfengr/alcl.git --branch cooley\n\nor\n\ngit clone git@github.com:alcf-perfengr/alcl.git --branch cooley\n

          The tutorial provides examples for C, C++, Python, and Ruby. Individual C and C++ examples can be built and run, from their respective directories using \u2018make\u2019 and the name of the example (e.g. platform).

          cd alcl/C++\nmake platform\nmake run_platform\n
          Note: OpenCL examples will only run correctly on the Cooley compute nodes as there is no CPU driver installed. For the Python examples, users will need to first install pyopencl before running similar \u2018make\u2019 commands.

          pip install --user --upgrade numpy pyopencl\n\ncd alcl/Python\nmake my_first_kernel\nmake run_my_first_kernel\n
          "},{"location":"cooley/programming-models/openmp/","title":"OpenMp on Cooley","text":""},{"location":"cooley/programming-models/openmp/#overview","title":"Overview","text":"

          The OpenMP API is an open standard for parallel programming. The specification document can be found here: https://www.openmp.org. The specification describes directives, runtime routines, and environment variables that allow an application developer to express parallelism (e.g. shared memory multiprocessing and device offloading). Many compiler vendors provide implementations of the OpenMP specification.

          "},{"location":"cooley/programming-models/openmp/#using-openmp-at-alcf","title":"Using OpenMP at ALCF","text":"

          OpenMP support for CPUs on Cooley is provided through the GNU, Intel, and LLVM Clang compilers available on Cooley. Guidance on updating your environment to use one of these compilers is available here.

          OpenMP offload support for GPUs on Cooley is provided via community compilers. The LLVM Clang compiler is installed on Cooley to support OpenMP 4.5+ offload features. The compiler is in rapid development and ALCF staff build it frequently from the master branch.

          The status of offload features in this compiler is available on the LLVM Clang website. Considering that the compiler is under active development, the compiler may contain bugs and those should be reported directly to the compiler team here.

          "},{"location":"cooley/programming-models/openmp/#building-on-cooley","title":"Building on Cooley","text":"

          OpenMP parallelism for CPUs can be enabled for each supported compiler using the appropriate compiler flag: -fopenmp for GNU/Clang and -qopenmp for Intel compilers.

          soft add +intel-composer-xe-2018\n\nicpc -qopenmp main.cpp\n

          OpenMP settings, such as number of threads and affinity, can be controlled via OpenMP environment variables.

          The offload compiler is installed on Cooley at /soft/compilers/clang-ykt. To use this compiler, first update your software environment with the following CUDA and gcc softkeys and paths.

          soft add +cuda-10.0\nsoft add +gcc-6.4.0\nexport PATH=/soft/compilers/clang-ykt/latest/bin:$PATH\nexport LD_LIBRARY_PATH=/soft/compilers/clang-ykt/latest/lib:$LD_LIBRARY_PATH\n

          The following compiler flags are needed to enable offload compilation: -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda.

          clang++ -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda main.cpp\n

          "},{"location":"cooley/programming-models/openmp/#running-jobs-on-cooley","title":"Running jobs on Cooley","text":"

          An example \u2018test.sh\u2019 job submission script follows.

          #!/bin/sh\nNODES=`cat $COBALT_NODEFILE | wc -l`\nPROCS=$((NODES * 1))\nmpirun -f $COBALT_NODEFILE -n $PROCS ./a.out\n

          To request a single node with 10 minutes of walltime, charging to the MyProject project, one can use the following command.

          qsub -n 1 -t 10 -A MyProject ./test.sh\n

          "},{"location":"cooley/programming-models/openmp/#examples","title":"Examples","text":"

          There are a handful of simple examples available in the /soft/compilers/clang-ykt/example directory. To run an example, copy the source file to current working directory, compile, and submit to a compute node in an interactive job or as batch job using example script above.

          cp /soft/compilers/clang-ykt/example/test_simple2.cpp ./\nclang++ -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test_simple2.cpp\n\n./a.out\nenter constructor 0x7ffe3f6bb648\nhost pointer 0x2125590\ndevice pointer 0x620b840200\nRunning target region on device!\nmaptest constructor\ncheck_size = 6\ncheck_value = 1\n

          The NVIDIA tools can be used to debug and profile offloaded kernels compiled with the OpenMP offload clang-ykt compiler. For example, nvprof can be used to profile and verify that your application offloaded kernels to the GPUs on Cooley.

          nvprof ./a.out \n==2755== NVPROF is profiling process 2755, command: ./a.out\nenter constructor 0x7ffea1bb91c8\nhost pointer 0x3fcc7a0\ndevice pointer 0x620c240200\nRunning target region on device!\nmaptest constructor\ncheck_size = 6\ncheck_value = 1\n==2755== Profiling application: ./a.out\n==2755== Profiling result:\n            Type  Time(%)      Time     Calls       Avg       Min       Max  Name\n GPU activities:   48.51%  156.96us         1  156.96us  156.96us  156.96us  __omp_offloading_2d_e6fe0d__ZN7maptestIdEC1Em_l18\n                   46.44%  150.27us         1  150.27us  150.27us  150.27us  __omp_offloading_2d_e6fe0d__ZN7maptestIdE3runEv_l51\n                    3.56%  11.520us         5  2.3040us  1.9840us  2.7520us  [CUDA memcpy DtoH]\n                    1.49%  4.8310us         3  1.6100us  1.3110us  2.2080us  [CUDA memcpy HtoD]\n      API calls:   75.94%  292.38ms         1  292.38ms  292.38ms  292.38ms  cuCtxCreate\n                   21.48%  82.678ms         1  82.678ms  82.678ms  82.678ms  cuCtxDestroy\n                    0.98%  3.7905ms       256  14.806us  1.3690us  458.24us  cuStreamCreate\n                    0.69%  2.6478ms         1  2.6478ms  2.6478ms  2.6478ms  cuModuleLoadDataEx\n                    0.39%  1.4933ms         1  1.4933ms  1.4933ms  1.4933ms  cuModuleUnload\n                    0.17%  638.23us       256  2.4930us  2.0450us  23.244us  cuStreamDestroy\n\u2026\n

          "},{"location":"cooley/programming-models/raja/","title":"RAJA on Cooley","text":""},{"location":"cooley/programming-models/raja/#overview","title":"Overview","text":"

          RAJA is a collection of C++ software abstractions, being developed at Lawrence Livermore National Laboratory (LLNL), that enable architecture portability for HPC applications. The overarching goals of RAJA are to:

          Make existing (production) applications portable with minimal disruption Provide a model for new applications so that they are portable from inception. RAJA targets portable, parallel loop execution by providing building blocks that extend the generally-accepted parallel for idiom.

          Additional information can be found at RAJA User Guide.

          "},{"location":"cooley/programming-models/raja/#using-raja","title":"Using RAJA","text":"

          RAJA provides a project template for how to use RAJA in an application project that uses CMake or Make. This is located at RAJA Project Template.

          "},{"location":"cooley/programming-models/raja/#how-to-get-the-source-code","title":"How to get the source code","text":"

          The RAJA source code lives at RAJA github.

          It can be cloned with git clone --recursive https://github.com/llnl/raja.git. The recursive clone will also clone RAJA's dependencies in the proper locations.

          "},{"location":"cooley/programming-models/raja/#building-on-cooley","title":"Building on Cooley","text":"

          RAJA requires a compiler with C++11 support and CMake version 3.9 or greater. RAJA includes an example build script for Cooley that you can use. A quick example is below.

          cd raja\ncp scripts/alcf-builds/cooley_nvcc9.1_clang4.0.sh .\n./cooley_nvcc9.1_clang4.0.sh\ncd build_alcf-cooley-nvcc9.1_clang4.0/\nmake -j 4\n

          The build script makes use of a CMake configuration file raja/host-configs/alcf-builds/cooley_nvcc_clang4.0.cmake. The RAJA and compiler options can be adjusted in this configuration file.

          "},{"location":"cooley/software-and-libraries/jupyter-notebooks/","title":"Jupyter Notebooks on Cooley","text":""},{"location":"cooley/software-and-libraries/jupyter-notebooks/#jupyter-notebooks-on-cooley_1","title":"Jupyter Notebooks on Cooley","text":"

          Frequently, it's very useful for prototyping, analysis, or debugging to have an interactive session on Cooley using jupyter notebooks to debug your python scripts. The single node containers supported by the ALCF datascience group have jupyter and common python packages installed, and so it is possible to use jupyter notebooks with the production ML/DL software in an interactive way.

          For more information about the supported software in the containers, or alternative ways to get pytorch/tensorflow, please see the page on Machine Learning Tools.

          "},{"location":"cooley/software-and-libraries/jupyter-notebooks/#setting-up-jupyter-on-an-interactive-node","title":"Setting up Jupyter on an Interactive node","text":"

          To use jupyter notebooks on the GPUs, you will need an interactive node. Please refer to the cobalt and job submission documentation for details, but a simple interactive node request could look like this:

          qsub -I -n 1 -t 60 -A [project] -q debug --attrs nox11.  \n

          If you need network access from your interactive node, be sure to use the pubnet queues.

          Once your interactive job has started, take note of which node you are on. On Cooley, it's possible to use ssh from the login node directly to the worker nodes. There are several ways to see which node your job is running on. The easiest is to see in your terminal the user@ccXXX where XXX is the node number, ranging from 001 to 0016 for the debug queue, and cc017 to cc126 for the other queues. You can also find the node information from qstat -u [username].

          From the interactive node, you can launch one of the datascience containers (\"singularity exec --nv /soft/datascience/singularity/pytorch/centos7-cuda9.0-torch1.0.img bash\"). This will give you a shell inside the container, and from there you can start a jupyter notebook with:

          juypter notebook.\n

          You will see output from jupyter, including a link to access the notebook in a browser. The link will look something like http: //localhost:8888/?token=[long string of numbers and letters]. This should indicate that your notebook is up and running on the interactive node. The port in the above address (8888) can be configured with the --port syntax to jupyter notebook, if you want to use a different port.

          "},{"location":"cooley/software-and-libraries/jupyter-notebooks/#connecting-your-laptop-browser-to-your-juypter-notebook","title":"Connecting your laptop browser to your juypter notebook","text":"

          To use your browser on your laptop with jupyter hub, connect to cooley with port forwarding to the login node. The easiest way to do this is to open a new terminal and create an ssh connection to Cooley and connect the port number from your jupyter notebook.

          If it's port 8888 like in the example above, do:

          ssh -L 8888:localhost:8888 user@cooley.alcf.anl.gov.  \n

          From the login node, make a second port-forwarding ssh connection to the interactive node running your jupyter notebook:

          `ssh -L 8888:localhost:8888 ccXXX`, \n
          where ccXXX is the interactive node with the correct numbers. This will forward the port running jupyter notebook on the interactive node to the login node, and from the login node to your local computer. You can copy/paste the link from jupyter into your browser, and a jupyter hub window should appear as normal in your browser. Since it is running the datascience container on the interactive node, you will have access to tensorflow or pytorch (whichever you picked) and the other available software from your jupyter notebook.

          Running your software on the GPUs is then exactly the same as in a normal python script, by calling x.cuda() for a pytorch tensor x for example.

          For questions or bug reports on jupyter on Cooley, please email support@alcf.anl.gov

          "},{"location":"cooley/software-and-libraries/machine-learning-tools/","title":"Machine Learning Tools on Cooley","text":"

          Cooley is designed as a visualization cluster, because of the support for Nvidia GPUs. However, it is also possible to run machine learning workflows on Cooley.

          Horovod will not have any significant impact on performance on single GPUs, however since the K80 nodes have 2 GPUs per node it is recommended you use horovod with data parallel learning to take advantage of both GPUs.

          "},{"location":"cooley/software-and-libraries/machine-learning-tools/#running-a-machine-learning-workflow-on-cooley-with-containers","title":"Running a machine learning workflow on Cooley with containers","text":"

          For more information about building containers for Cooley, please see the Singularity on Cooley page. This section will focus on using containers for machine learning and deep learning workflows.

          Because singularity is setting up a containerized system, there are several important steps to take note of:

          1. Use the --nv option to singularity exec to enable Nvidia GPU drivers within the container. Without this, you will not be able to take advantage of Nvidia gpu acceleration.

          2. Make sure you bind necessary directories correctly. By default, not all areas mounted on the host system (outside the container) are available inside the container. To access an area, you can bind it with the -B outside_loc:inside_loc syntax. For example, to access the theta projects area from inside a container on Cooley, use -B /lus:/lus as part of your singularity command.

          3. Run the container inside of mpirun calls. For example, do mpirun -n 2 singularity exec --nv -B /lus:/lus $IMAGE /path/to/python/script.py and NOT singularity exec $IMAGE mpirun -n 2 /path/to/python/script.py (where $IMAGE is the path to the container you want to run)

          Running the mpi containers with both GPUs per node has been demonstrated to scale to many nodes on Cooley, so distributed learning is feasible on Cooley.

          "},{"location":"cooley/software-and-libraries/machine-learning-tools/#extending-available-software-in-containers","title":"Extending Available Software in Containers","text":"

          If you start with an existing Singularity container, it is possible to add additional software. The most straightforward path is to install it via pip while in the container, using the --user flag if you can. In this way, you can add extensions to tensorflow/pytorch, or IO frameworks, etc. Email support@alcf.anl.gov for questions concerning these techniques.

          "},{"location":"cooley/software-and-libraries/machine-learning-tools/#non-container-software-solutions","title":"Non-container software solutions","text":"

          It is perfectly possible to run tensorflow, pytorch, etc outside of a container on Cooley. We don't support official builds or distributions of this, but because Nvidia GPUs are very common for ML and DL software, there are many excellent tools available for getting GPU optimized tensorflow, pytorch, etc. Solutions that can work on Cooley are pip, conda, and virtualenv, and possibly others. Note that you will need to add Cuda libraries from softenv to use these tools.

          "},{"location":"cooley/software-and-libraries/paraview-tutorial/","title":"ParaView Tutorial on Cooley","text":""},{"location":"cooley/software-and-libraries/paraview-tutorial/#overview","title":"Overview","text":"

          This tutorial is intended to be a hands-on resource for users interested in learning the basic concepts of ParaView. The examples can easily be run on a laptop, using the example data set provided.

          • Tour of ParaView
          • Show range of visualization methods
          • Walk through various visualization techniques, hopefully illustrate how these can apply to your own data
          • Feel for ParaView \"way\"
          • Terminology and step-by-step process peculiar to ParaView, which may differ from other packages, e.g. VisIt

          Bloodflow Visualization by Joe Insley, ALCF"},{"location":"cooley/software-and-libraries/paraview-tutorial/#data","title":"Data","text":"

          The data used for this tutorial is: - Blood flow simulation data - Multiple data types - Continuum data field (unstructured mesh, tetrahedral): fluid field, plasma - Particle data (unstructured points): individual particles moving in the flow - Red Blood Cells (RBC, unstructured mesh, triangle): mesh of the surface of an RBC - Healthy - Diseased - Generated using an integrated Nektar/LAMMPS simulation code - Courtesy of George Karniadakis and Leopold Grinberg of Brown University

          The data is available for download here (~27MB compressed, ~39MB uncompressed): Data set for ParaView Red Blood Cell Tutorial

          "},{"location":"cooley/software-and-libraries/paraview-tutorial/#1-load-multi-component-dataset","title":"1. Load Multi-component Dataset","text":"
          • From the Filemenu, (you can also click the file folder icon, shown above) open each of the following data sets (select then click \"OK\")
          • The files will then appear in the Pipeline Browser
          • Click Apply in the Object Inspector
          • You will need to do this one at a time:
          • continuum...vtu
          • particles...vtu
          • rbc_...vtu
          • bad_rbc...vtu Note: The \"...\" in the name, and the arrow in the file browser, indicates that there are multiple time steps for each of these files
          With all of the default settings, you should see something like this"},{"location":"cooley/software-and-libraries/paraview-tutorial/#2-select-which-data-to-view","title":"2. Select which data to view","text":"

          Let's start by looking at the continuum.000*data. This is an unstructured mesh that has velocity and count (density) values. - Hide other data sets using the Eyeball icon next their names in the Pipeline Browser. - Black = visible, Grey = hidden - Select continuum.000*(name is highlighted) in the Pipeline Browser - Click on the name to highlight it - When manipulating appearance or applying filters, these always affect the selected data set - Switch to the Display tab in the Object Inspector - Under Color by, select Velocity from the dropdown - There is also a shortcut to Color by in the menu bar near the top of the GUI -

          Select which data to view"},{"location":"cooley/software-and-libraries/paraview-tutorial/#3-manipulating-the-color-map","title":"3. Manipulating the Color Map","text":"

          To change the colors used to represent the Velocity: - Under Color byclick the Edit Color Map... button - On the Color Scale Editor window click the Choose Preset button - On the Preset Color Scales window, select: Blue to Red Rainbow, and click OK. Then click Close on the Color Scale Editor window - You can also create and save your own color maps

          Manipulating the Color Map"},{"location":"cooley/software-and-libraries/paraview-tutorial/#4-data-representation","title":"4. Data Representation","text":"

          In order to be able to see the particles and red blood cells inside the cylinder, we need to be able to see through it. If we scroll down a bit in the Object Inspector view: - Group of controls labeled Style - In the Representation dropdown, select Wireframe

          Data Representation"},{"location":"cooley/software-and-libraries/paraview-tutorial/#5-generate-streamlines","title":"5. Generate Streamlines","text":"
          • ParaView enables the generation of different types of data from existing data sets in the Pipeline
          • Streamlines: Generated from vectors of the flow field. These curves show the direction a fluid element will travel in at any point in time
          • Make sure that the continuum.000*data is selected in the Pipeline Browser
          • From the main menu select: Filters->Alphabetical->Stream Tracer, or click on the Stream Tracer icon from the menu bar
          • In the Object Inspector make sure the Properties tab is selected.
          • Scroll down to seeds, and change Seed Type to Line Source
          • Click the Y Axis button to set the seed line to run along the Y axis.
          • The default Resolution is set to 100. This will make things a bit cluttered, especially when we start adding in the other data, so let's reduce this to 25
          • Click the Apply button
          Generate Streamlines"},{"location":"cooley/software-and-libraries/paraview-tutorial/#6-streamlines-as-tubes","title":"6. Streamlines as Tubes","text":"

          The streamlines are just that, lines. We can use the Tubes filter to represent them as 3D objects, rather than just lines. - With StreamTracer1selected in the Pipeline Browser, from the main menu select: Filters->Alphabetical->Tube - In the Object Inspector make sure the Properties tab is selected - The default value for the Radius is a bit too large for this data, let's set that value to 0 - Click the Apply button - Notice that the StreamLine1 object has automatically been hidden - There are many different ways to color these tubes - With Tubes1 selected, switch to the Display tab in the Object Inspector - The Color bydropdown lets you choose from a handful of different variables

          Streamlines as Tubes"},{"location":"cooley/software-and-libraries/paraview-tutorial/#7-cutting-planes-slices","title":"7. Cutting Planes (Slices)","text":"

          Now let's add some cutting plans, or slices, to see what the cross-section of the continuum data looks like. - Again, be sure that the continuum.000*data is selected in the Pipeline Browser - Filters->Alphabetical->Slice or Click on the Slice icon from the menu bar - In the Object Inspector make sure the Propertiestab is selected - At the bottom on the Object Inspector is a section titled Slice Offset Values. Here we can generate values for multiple slices to be made - First click the Delete All button to remove initial values - Next, click the New Range button. This will bring up an Add Range dialog box. - Set the number of Steps to 7. Click OK - Click the Apply button - With Slice1 selected in the Object Inspector, switch to the Display tab - Set Color by value to Velocity

          Cutting Planes (Slices)"},{"location":"cooley/software-and-libraries/paraview-tutorial/#8-data-representation-opacity","title":"8. Data Representation: Opacity","text":"

          Even with the continuum data represented as wireframe, there is still considerable occlusion of the interior structures. In order to further reduce this occlusion by the wireframe, we can make it more transparent.

          • Again, be sure that the continuum.000*data is selected in the Pipeline Browser
          • In the Object Inspector make sure the Display tab is selected
          • In the Object Inspector there is a section titled Style
          • Set Opacity to 0.2

          Data Representation: Opacity"},{"location":"cooley/software-and-libraries/paraview-tutorial/#9-animating-simulation-data","title":"9. Animating Simulation Data","text":"

          Since our data has multiple time steps, we can easily animate through them to see how the data changes over time.

          • Simply click the Play button on the animation bar at the top of the GUI
          • Pause to make it stop
          • Loop: With this button toggled on, animation will repeat until stopped

          Animating Simulation Data"},{"location":"cooley/software-and-libraries/paraview-tutorial/#10-animations","title":"10. Animations","text":"

          Animations can be saved to disk as a movie file, to be played back later.

          • From the main menu: File->Save Animation
          • Animation Settings Dialog: Save Animation
          • Files of type: AVI files (*.avi)
          • Enter a name in File name:
          • Click OK
          • Movie can be played back with standard media players (Windows Media Player, QuickTime, VLC, etc.)

          Animations"},{"location":"cooley/software-and-libraries/paraview-tutorial/#11-particles-as-glyphs","title":"11. Particles as Glyphs","text":"

          Glyphs are another way of visually representing data where the attributes of a graphical element are dictated by attributes of the data.

          All of the particles are displayed as red points in the graphics window. There are ~39K particles in this particular data set, which makes the display a bit cluttered. In order to both filter some of these out, and create 3D representations for them, let's apply a glyph filter to this data.

          Now let's add some of our other data back into the scene. Let's start with the particle data.

          All of the particles are displayed as red points in the graphics window. There are ~39K particles in this particular data set, which makes the display rather cluttered. In order to both filter some of these out, and create 3D representations for them, we will apply the glyph filter to this data.

          Note: that the particles.000* is still visible.

          • Unhide the particles.000*data: click Eye icon
          • Select particles.000*data: click on name
          • Filters->Alphabetical->Glyph or click on the Glyph icon from the menu bar
          • Glyph Type: Sphere
          • Radius:. 0.15
          • Orient: Unchecked
          • Scale Mode: off
          • Set Scale Factor: 1 - Edit: Checked
          • Maximum Number of Points: 3000
          • Mask Points: Checked
          • Random Mode: Unchecked
          • Click the Apply button
          • Since our goal was to unclutter the display, let's hide the particles.000*by toggling them off, by clicking on the Eye icon next to it in the Pipeline Browser
          • Let's also switch to the Display tab in the Object Inspector, with Glyph1 selected, and change the Color by value to GlyphVector. Since the GlyphVector value is based on the velocity. We can Edit Color Map...and choose the same Blue to Red Rainbow preset that we previously chose for velocity

          Particles as Glyphs"},{"location":"cooley/software-and-libraries/paraview-tutorial/#12-enter-red-blood-cells","title":"12. Enter: Red Blood Cells","text":"

          Now let's add in both of the other data sets, which are polygonal meshes which make up Red Blood Cells (RBCs).

          These two data sets are essentially the same kind of data, so we can apply the same filters and make the same types of representation changes to each of them. However, some of the RBCs are marked by the simulation that generated them as healthy (rbc.000) and some of them are marked as diseased (bad_rbc.000).

          • Unhide the rbc.000 and bad_rbc.000 data sets by clicking the Eye icon next to each of them to make them visible

          Enter: Red Blood Cells"},{"location":"cooley/software-and-libraries/paraview-tutorial/#13-using-color-to-differentiate-data","title":"13. Using Color to Differentiate Data","text":"

          To enable us to distinguish these two types of data from one other, we can vary their representations.

          One way to do this is by setting the color of the two data sets to different colors. Repeat this process for each of rbc.000 and bad_rbc.000, picking different colors.

          • Select one of the rbc data sets in the Pipeline Browser
          • Go to the Displaytab in the Object Inspector
          • In the Color by:dropdown select Solid Color
          • Click on the Set Solid Color... button
          • Select a color from the Select Colordialog that appears
          • Repeat for the other RBC data set, choosing a different color

          Using Color to Differentiate Data"},{"location":"cooley/software-and-libraries/paraview-tutorial/#14-further-exploration-highlight-the-mesh","title":"14. Further Exploration: Highlight the Mesh","text":"

          Change the representation of one of the RBC data sets.

          In this example, the continuum.000* data is also hidden to reduce confusion with showing multiple overlapping meshes.

          • Select on of the RBC data sets
          • Go to the Displaytab in the Object Inspector
          • For the Representationselect Surface With Edges
          • In the Edge Style section click on the Set Edge Color...button to select a different color from the Select Color dialog

          Further Exploration: Highlight the Mesh"},{"location":"cooley/software-and-libraries/paraview-tutorial/#15-further-exploration-highlight-the-vertices","title":"15. Further Exploration: Highlight the Vertices","text":"

          Add glyphs to illustrate the position of the vertices of one of the RBC data sets.

          • Select one of the RBC data sets
          • Select the Glyphfilter
          • Since this filter was used recently, can also be found under: Filters->Recent->Glyph
          • As in the earlier example, set the various configuration options for the glyph attributes
          • Note: that this time, we want to show all of the vertices of the RBC, so we should uncheckthe Mask Points option

          Further Exploration: Highlight the Vertices"},{"location":"cooley/software-and-libraries/paraview-tutorial/#16-further-exploration-color-by-variable","title":"16. Further Exploration: Color by Variable","text":"

          Try playing around with the viewing options and representations of the other data objects.

          Change the: - Color by values - Opacity - Representation - Etc.

          Further Exploration: Color by Variable"},{"location":"cooley/software-and-libraries/paraview-tutorial/#17-background-color","title":"17. Background Color","text":"
          • Background color is an important part of final visualization
          • From the main menu choose: Edit->View Settings...
          • Under General in the View Settings dialog box, select Choose Color
          • Select Color: OK
          • Apply, then OK
          Background Color

          This tutorial was developed with support from National Science Foundation Grant OCI-0904190, and from the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357.

          "},{"location":"cooley/software-and-libraries/paraview/","title":"Paraview on Cooley","text":"

          The recommended way of running ParaView on Cooley is in client/server mode. This consists of running the ParaView client on your local resource, and the ParaView server (pvserver) on the Cooley visualization nodes. There are two ways to accomplish this, detailed below. In both cases, the ParaView client needs to first be installed on your local resource, and needs to match the version that you run on Cooley.

          The most recent production version currently installed on Cooley is ParaView 5.10.0. Binary and source packages for Linux, MacOS, and Windows are available from the ParaView Download Page. (Run the 'softenv' command on a Cooley login node to see earlier versions of ParaView that are available.)

          As mentioned, there are two ways to run ParaView in client/server mode. For more details, including advantages and disadvantages of each, see the section below on: Trade Offs

          The first, and arguably easier, way is to run the ParaView client locally, and have it launch the pvserver on Cooley, and connect back to your local client. For details see the section below on: Automated / Reverse Connection

          The other way is to first manually launch the pvserver on Cooley, and then launch the ParaView client locally and connect to your running pvserver. For details see the section below on: Manual / Forward Connection

          "},{"location":"cooley/software-and-libraries/paraview/#automated-reverse-connection","title":"Automated / Reverse Connection","text":"

          This section describes how to launch the pvserver on Cooley from a local ParaView client.

          "},{"location":"cooley/software-and-libraries/paraview/#start-paraview-client","title":"Start ParaView Client","text":"

          First, launch the ParaView client on your local resource. In order to launch the pvserver on Cooley and have it connect back to our local client, we will need to configure some server settings in the client. This initial set up should only need to be done once, and can be reused each time you want to run ParaView on Cooley.

          "},{"location":"cooley/software-and-libraries/paraview/#server-configuration","title":"Server Configuration","text":""},{"location":"cooley/software-and-libraries/paraview/#1-select-connect","title":"1. Select Connect","text":"

          From the ParaView client choose to connect to a server by either clicking on the \"Connect\" icon in the menu bar, or selecting:

          File->Connect

          From the main menu:

          Select connect"},{"location":"cooley/software-and-libraries/paraview/#2-fetch-servers-first-time-only","title":"2. Fetch Servers (first time only)","text":"

          The first time we want to run a pvserver on Cooley and have it connect to our local ParaView client, we need to set up a Server. Once we set up this server, we can reuse it each time we run the ParaView client with the pvserver on Cooley.

          Kitware, the maintainers of ParaView, maintain a database of server configurations, which we can retrieve through the ParaView client.

          Click \"Fetch Servers\"

          Fetch servers"},{"location":"cooley/software-and-libraries/paraview/#3-fetch-server-configuration-cooley","title":"3. Fetch Server Configuration, Cooley","text":"

          From the list of server configurations, if your local resource is Linux or Mac, select COOLEY@ANL

          If your local resource is Windows, select windows to COOLEY@ANL

          Click \"Import Selected\"

          Fetch server configuration Cooley"},{"location":"cooley/software-and-libraries/paraview/#4-connect","title":"4. Connect","text":"

          Now that we have a server defined and configured, highlight it in the list.

          Click \"Connect\"

          Connect to Cooley"},{"location":"cooley/software-and-libraries/paraview/#5-configure-server-settings","title":"5. Configure Server Settings","text":"

          In order to connect to Cooley and submit a job to launch the pvserver, you'll need to edit a few configuration settings.

          First, ParaView needs a way to securely connect to Cooley, in order to run a qsub command to launch the pvserver. On Linux/Mac it does this through Xterm and SSH. Be sure that the path to the Xterm executable is set correctly. On Windows be sure that the path to the SSH executable (such as PuTTY) is set correctly.

          Username: Should be set to your login name on Cooley.

          ParaView version: Be sure this matches the version of the ParaView client that you are running.

          Client port and Server port: Can use the default values

          Set the Number of nodes to reserve and Number of minutes to reserve accordingly.

          NOTE: You will be submitting a job to the queue on Cooley, and the number of nodes you request are not guaranteed to be available. If they are not, you may have to wait in the queue until the requested number of nodes become available.

          Account: set to your project id / allocation on Cooley

          Click \"OK\"

          Configure server settings"},{"location":"cooley/software-and-libraries/paraview/#6-connecting-enter-password","title":"6. Connecting: Enter Password","text":"

          A window will pop up, indicating that ParaView is connecting to Cooley, and is waiting for the server to connect back to your client.

          A second window will pop up (Xterm on Linux/Mac. PuTTY, or other local SSH client, on Windows). Enter your PIN and secure token one time password, just as you would when logging into Cooley.

          This will enable ParaView to submit your job to run the pvserver on Cooley. Once your job starts, and the pvserver connects back to your ParaView client, the Waiting for Server Connection window will go away.

          Establishing connection to Cooley

          Select connect"},{"location":"cooley/software-and-libraries/paraview/#2-add-or-edit-server","title":"2. Add (or Edit) Server","text":"

          The first time we connect our local ParaView client to a pvserver on Cooley, we need to Add Server. Once we set up this server, we can reuse it each time we connect the ParaView client to the pvserver on Cooley. We may need to Edit Server in the future if our pvserver ends up on a different host.

          Click \"Add Server\" (first time) or \"Edit Server\" (subsequent times)

          Add (or edit) server"},{"location":"cooley/software-and-libraries/paraview/#3-configure-server-part-1","title":"3. Configure Server, Part 1","text":"

          Configure the server by first giving it a Name, such as Cooley

          Select Server Type: Client/Server

          The Host value should be set to the full name of the head node of our job where the pvserver is listening: cc018.cooley.pub.alcf.anl.gov

          And the port should be set to the value we used when we started the pvserver: 8000.

          Click \"Configure\"

          Configure server, part 1"},{"location":"cooley/software-and-libraries/paraview/#4-configure-server-part-2","title":"4. Configure Server, Part 2","text":"

          Because we are going to connect to a ParaView server that we have already started, we don't need the ParaView client to start a server for us.

          Select Startup Type: Manual

          Click \"Save\"

          Configure server, part 2"},{"location":"cooley/software-and-libraries/paraview/#5-connect","title":"5. Connect","text":"

          Now that we have a server defined and configured, highlight it in the list.

          Click \"Connect\"

          Connect to Cooley"},{"location":"cooley/software-and-libraries/paraview/#6-open-file","title":"6. Open File","text":"

          Now when you select File->Open from the main menu, you will be browsing the filesystem on Cooley. You're ready to go.

          "},{"location":"cooley/software-and-libraries/singularity-cooley/","title":"Singularity on Cooley","text":""},{"location":"cooley/software-and-libraries/singularity-cooley/#singularity-on-alcf-resources","title":"Singularity on ALCF Resources","text":"

          Singularity is a software container solution to enable fine grained control over application environments on a diverse set of hardware systems. For details on singularity, please see the Singularity documentation.

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#singularity-on-cooley_1","title":"Singularity on Cooley","text":"

          This page is specific to setting up and running singularity on Cooley. If you are interested in running singularity on theta, we have detailed Theta specific documentation for singularity.

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#why-use-singularity-on-cooley","title":"Why use Singularity on Cooley?","text":"

          Singularity provides several advantages on Cooley. First, singularity provides more control over the software environment you run your code in, meaning you can have fine grained control over packages like tensorflow, pytorch, or other code designed to run on GPUs. While Cooley is primarily a visualization cluster, it can still be useful to be able to apply it's GPUs to datascience and machine learning challenges.

          Second, singularity allows you to \"bootstrap\" images off of other singularity or docker images. In particular, this means you can leverage an image built by nvidia with cuda, cudnn, without having to worry about the installation of that software yourself. Singularity can save you time spent managing your runtime environment and let you focus on your application development.

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#how-to-use-singularity-on-cooley","title":"How to use Singularity on Cooley","text":"

          In general, using a singularity container is a simple process. You can execute a script inside of a container with a command such as:

          singularity exec /path/to/singularity_container.img command_to_execute\n
          If you want to run an interactive container, you can use bash as your executable:
          singularity exec /path/to/singularity_container.img bash\n

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#using-nvidia-drivers-inside-of-singularity","title":"Using nvidia drivers inside of Singularity","text":"

          Singularity allows you to use the nvidia device drivers on the host system within the container by passing the --nv option. On the cooley compute nodes, this allows you to run tensorflow/pytorch on the GPUs inside of the container transparently, with no special set up or modification to your scrips.

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#other-important-runtime-flags","title":"Other Important Runtime Flags","text":"

          For a full list of available runtime flags, you can look at the output of \"singularity run help\". An important flag to know is to bind mount points into the container. By default, most mount points will not be available in the container, but if you want to use the /lus system from within your container you can have singularity make it available at runtime like this:

          singularity exec -B /lus:/lus /path/to/singularity_container.img command_to_execute\n

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#using-singularity-with-mpi","title":"Using Singularity with MPI","text":"

          Singularity has builtin support to use MPI and run MPI applications, as long as MPI is built into your container (more on how to do that below). You can see details of the design on the singularity docs for HPC.

          In general, singularity must have MPI installed within the container. When you execute an MPI program with singularity, each rank of your program runs singularity:

          mpirun -n ${NPROC} singulari\u200bty exec /path/to/singularity_container.img command_to_execute [executable args]\n
          If you want to run 2 ranks per node on Cooley (one for each of the two GPUs on each node) you can pass the list of hosts to mpirun with the -H command:
          mpirun -n ${NPROC} -H $COBALT_NODEFILE singularity exec /path/to/singularity_container.img command_to_execute [executable args]\n

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#special-considerations-for-running-mpi-and-singularity-with-gpus-on-cooley","title":"Special Considerations for running MPI and Singularity with GPUs on Cooley","text":"

          Executing MPI code on Cooley nodes can sometimes have extra details due to the communication that happens across GPUs if you use NCCL2. If your code is hanging unexpectedly when using both GPUs per node, you can try disabling peer to peer communication in NCCL with:

          export NCCL_P2P_DISABLE=1\n
          within the container execution (or pass it into the container with a singularity environment variable).

          Additionally, running with both GPUs on a node can cause a crash with CUDA code if there are X processes running. You can use the nox11 queue, or pass the attribute nox11 to any cobalt job to turn off X processes for your job.

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#building-a-singularity-container-with-tensorflow-or-pytorch-for-gpu","title":"Building a Singularity Container with TensorFlow or PyTorch for GPU","text":"

          Building a container for singularity requires either root priveleges on a machine with singularity installed, or the use of singularity hub to build your images. This guide won't go in to the details of how to build your singularity image, but rather contain recipes and snippets that are useful for builds targeted towards Cooley using common packages.

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#bootstrapping-a-basic-image","title":"Bootstrapping a Basic Image","text":"

          The basic details of your singularity recipe could follow a recipe that looks like this:

          Bootstrap: docker\nFrom: nvidia/cuda:9.0-cudnn7-devel-centos7\n\n%help\nCentos7 with cuda9.0 cudnn7\n\nTo start your container simply try\nsingularity exec THIS_CONTAINER.simg bash\n\nTo use GPUs, try\nsingularity exec --nv THIS_CONTAINER.simg bash\n\n\n%environment\n\n    # for system\n    export CUDA_DEVICE_ORDER=PCI_BUS_ID\n\n    # Add cupti to the path for profiling:\n    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64\n\n    source scl_source enable devtoolset-4\n\n%post\n\n    # yum basics\n    yum update -y\n    yum groupinstall -y \"Development Tools\"\n    yum install -y epel-release\n    yum install -y centos-release-scl\n    yum install -y devtoolset-4\n    yum install -y wget emacs vim\n    yum install -y emacs vim openssh-clients zip\n    yum install -y python-devel python-pip python-setuptools\n    yum install -y hdf5\n\n    # pip basics\n    pip --no-cache-dir --disable-pip-version-check install --upgrade setuptools\n    pip --no-cache-dir --disable-pip-version-check install future\n    pip --no-cache-dir --disable-pip-version-check install 'matplotlib<3.0' # for python2.7\n    pip --no-cache-dir --disable-pip-version-check install 'ipython<6.0'    # for python2.7\n    pip --no-cache-dir --disable-pip-version-check install 'ipykernel<5.0'  # for python2.7\n    pip --no-cache-dir --disable-pip-version-check install numpy wheel zmq six pygments pyyaml cython gputil psutil humanize h5py tqdm scipy seaborn tables\n    pip --no-cache-dir --disable-pip-version-check install  pandas scikit-image scikit-learn Pillow opencv-python\n    pip --no-cache-dir --disable-pip-version-check install jupyter notebook\n

          To add a package like tensorflow, or pytorch, you can install them with pip at the end of the post section as normal. While building from source is not discouraged, it is much easier to install tensorflow/pytorch using pip:

          # tensorflow\npip --no-cache-dir --disable-pip-version-check install --upgrade tensorflow-gpu==1.12.0\npip --no-cache-dir --disable-pip-version-check install tensorboard\n\n# keras\npip --no-cache-dir --disable-pip-version-check install keras\n

          The recipe above has a couple of details worth pointing out. First, note the \"Bootstrap\" lines at the very beginning. This indicates to singularity that this recipe is going to take the centos7 cuda enable docker image from nvidia, and build a singularity container on top of that. Those two lines get cuda, cudnn, and save a lot of work.

          The %help section is for informational purposes only, you can put whatever you like here to help keep organized.

          The %environment section runs every time this container activates. It functions somewhat like a login script in that commands there will be executed prior to whatever command has been passed to the container. In this recipe, it adds the nvidia profiling tools to the path, and enables centos libraries installed to get gcc 5+.

          The %post section runs only during the image build, after the basic set up completes. It is where you can control and install software (with sudo permission for inside the container). In this recipe, there are a lot of general purpose tools installed for development purposes, many of which may be unnecessary for your application and you can (and should) prune to what you want if you use this recipe. Additionally, there are python tools installed with pip that may be useful and again you should prune as needed.

          "},{"location":"cooley/software-and-libraries/singularity-cooley/#building-mpi-and-common-mpi-tools-for-your-container","title":"Building MPI and common MPI tools for your container","text":"

          MPI needs to be installed inside the container for MPI applications to run within singularity. To do this, add the following snippet to the %post section (this is nearly identical to the Theta singularity instructions):

          # install MPICH\nwget -q http://www.mpich.org/static/downloads/3.2.1/mpich-3.2.1.tar.gz\ntar xf mpich-3.2.1.tar.gz\nrm mpich-3.2.1.tar.gz\ncd mpich-3.2.1\n# disable the addition of the RPATH to compiled executables\n# this allows us to override the MPI libraries to use those\n# found via LD_LIBRARY_PATH\n./configure --prefix=/usr/local/mpich/install --disable-wrapper-rpath\nmake -j 4 install\n# add to local environment to build pi.c\nexport PATH=$PATH:/usr/local/mpich//install/bin\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/mpich//install/lib\nenv | sort\ncd ..\nrm -rf mpich-3.2.1\n

          Additionally, the following lines should be added to the %environment section: # for MPICH:

          export PATH=/usr/local/mpich/install/bin/:${PATH}\nexport LD_LIBRARY_PATH=/usr/local/mpich/install/lib/:${LD_LIBRARY_PATH}\n

          If you want to use Nvidia's Collective Communications Library (NCCL2), you can install it:

          # nccl2\ngit clone https://github.com/NVIDIA/nccl.git\ncd nccl;\nmake -j src.build\nmake pkg.redhat.build\nrpm -i build/pkg/rpm/x86_64/libnccl* \ncd -\n

          mpi4py has worked in containers on Cooley from the pip installation:

          pip --no-cache-dir --disable-pip-version-check install mpi4py\n

          Lastly, if you want to use horovod for distributed training, the following installation technique has been shown to work with tensorflow (but not pytorch):

          ldconfig /usr/local/cuda/lib64/stubs\n# install Horovod, add other HOROVOD_* environment variables as necessary\nHOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_TENSORFLOW=1 HOROVOD_NCCL_HOME=/nccl/build/ pip install --no-cache-dir horovod\n\n# revert to standard libraries\nldconfig\n

          For pytorch, this technique causes segmentation faults at runtime. Instead, build your container without horovod. Then, on an interactive node with --nv, and with torch installed into your container, build horovod with the same pip command from inside an interactive shell in your container. Install it to a location in your area (--user if you like, or --prefix for more control) and add that area to your python path in your run scripts. While this is a workaround, it works to enable horovod inside of your container for distributed learning.

          "},{"location":"cooley/software-and-libraries/visit-on-cooley/","title":"Visit on Cooley","text":""},{"location":"cooley/software-and-libraries/visit-on-cooley/#getting-started","title":"Getting Started","text":"

          On your local machine:

          • Download (https://wci.llnl.gov/simulation/computer-codes/visit/downloads) and install VisIt (The most recent version installed on Cooley is 2.12.3. Versions 2.10.2 and 2.9.1 are also available.)
          • Download the Cooley host profile for VisIt (you may need to right-click and choose \"Save link as...\" or \"Save target as...\")
          • Copy this file to a file called ~/.visit/hosts/host_anl_cooley.xml
          "},{"location":"cooley/software-and-libraries/visit-on-cooley/#running-visit-in-interactive-mode","title":"Running VisIt in Interactive Mode","text":"
          • Start up VisIt on your local machine (Version 2.12.3, 2.10.2, or 2.9.1)
          • Click File -> Open File and choose \"ANL Cooley\" from the \"Host\" dropdown
          • You'll be prompted for your password; enter your Cryptocard response (with PIN)
          • When you open a selected file, it will launch a job on Cooley

            • You will need to specify the \"Bank\" (Project) to use when VisIt submits jobs to the queue on Cooley. Specify a project in the Options box.
            • If your environment doesn't get sourced correctly with non-interactive SSH, you can set the default project to use under Options -> Host profiles
            • Note: Don't change the contents of the \"Machine file\" field (it should be $COBALT_NODEFILE)
            • Note: The default Launch Profile is set to serial. Do not change this default setting.
            • If you'd like to change other job parameters (like the number of processes, nodes, and walltime), you can do so.
            • If you'd like these changes to be used as your default, be sure to save them using Save Settings under the Options menu.
          "},{"location":"cooley/software-and-libraries/visit-on-cooley/#running-visit-in-batch-mode","title":"Running VisIt in Batch Mode","text":"
          • Edit your .soft.cooley file to include the \"@visit\" key before the \"@default\" line
          • When running VisIt in batch mode on Cooley, the default version is 2.12.3 If you need a different version, you can specify this as a command line option using \"-v \" (versions 2.10.2 and 2.9.1 are also supported)"},{"location":"cooley/software-and-libraries/visit-on-cooley/#additional-information","title":"Additional Information","text":"
            • Visit user manual: https://wci.llnl.gov/codes/visit/manuals.html
            • Visit wiki: http://www.visitusers.org
            "},{"location":"data-management/acdc/acdc-overview/","title":"ALCF Community Data Co-Op (ACDC)","text":""},{"location":"data-management/acdc/acdc-overview/#overview-of-the-alcf-community-data-co-op-acdc","title":"Overview of the ALCF Community Data Co-Op (ACDC)","text":"

            The ALCF Community Data Co-Op (ACDC) powers data-driven research by providing a platform for data access and sharing, and value-added services for data discovery and analysis.

            A fundamental aspect of ACDC is a data fabric that allows programmatic data access, and straightforward large-scale data sharing with collaborators via Globus services.This provides a platform to build out different modalities for data access and use, such as indexing of data for discovery, data portals for interactive search and access, and accessible analysis services. ACDC will continue to be expanded to deliver ALCF users the platform to build customizable and accessible services towards the goal of supporting data-driven discoveries.

            "},{"location":"data-management/acdc/acdc-overview/#data-access-and-sharing","title":"Data access and sharing","text":"

            ALCF project PIs can share data on Eagle with their collaborators, making facility accounts unnecessary. With this service, the friction of data sharing amongst collaborators is eliminated \u2013 there is no need to create copies of data for sharing, or allocation and accounts just to access data. ALCF PIs can grant access to data, at read-only or read/write access levels. Non-ALCF users throughout the scientific community, who have been granted permissions, can access the data on Eagle filesystem using Globus.

            Access to the data for ALCF users and collaborators is supported via bulk transfer (Globus transfer) or direct browser-based access (HTTP/S). Direct connections to high-speed external networks permit data access at many gigabytes per second. Management of permissions and access is via a web application or command line clients, or directly via an Application Programming Interface (APIs). The interactivity permitted by the APIs distinguishes ACDC from the ALCF\u2019s previous storage systems and presents users with many possibilities for data control and distribution.

            "},{"location":"data-management/acdc/acdc-overview/#data-portal-for-discovery-and-access","title":"Data portal for discovery and access","text":"

            ACDC\u2019s fully supported production environment is the next step in the expansion of edge services that blur the boundaries between experimental laboratories and computing facilities. The use and prominence of such services at the ALCF are only expected to increase as they become more integral to the facility\u2019s ability to deliver data-driven scientific discoveries.

            ACDC includes several project-specific data portals that enable search and discovery of the data hosted on Eagle. The portals allow users to craft queries and filters to find specific sets of data that match their criteria and use faceted search for the discovery of data. Portals also provide the framework for other interfaces including data processing capabilities, all secured with authentication and configured authorization policy.

            The ACDC portal is a deployment of Django Globus Portal Framework customized for a variety of different projects For most of these projects, the search metadata links directly to data on Eagle, with browser-based download, preview, and rendering of files, and bulk data access.

            "},{"location":"data-management/acdc/acdc-overview/#getting-started","title":"Getting Started","text":"
            1. Request an allocation: Researchers or PIs request an allocation on Eagle, and a project allocation is created upon request acceptance.
            2. Manage Access: PIs can manage the space independently or assign other users to manage the space, as well as provide other users with read or read/write access for folders in the space. Globus groups and identities are used to manage such access.
            3. Authentication: Globus is used for authentication and identity needed to access the system. As Globus has built-in support for federated logins, users can access ACDC using their campus or institution federated username and passcode

            If you are new to the ALCF, follow these instructions on how to transfer your data to ACDC: Transferring Data to Eagle

            If you already have an ALCF account, follow these instructions on how to share your data: Sharing Data to Eagle

            "},{"location":"data-management/acdc/eagle-data-sharing/","title":"Sharing on Eagle Using Globus","text":""},{"location":"data-management/acdc/eagle-data-sharing/#overview","title":"Overview","text":"

            Collaborators throughout the scientific community have the ability to write data to and read scientific data from the Eagle filesystem using Globus sharing capability. This capability provides PIs with a natural and convenient storage space for collaborative work.

            Note: The project PI needs to have an active ALCF account to set up Globus guest collections on Eagle, and set permissions for collaborators to access data.

            Globus is a service that provides research data management, including managed transfer and sharing. It makes it easy to move, sync, and share large amounts of data. Globus will manage file transfers, monitor performance, retry failures, recover from faults automatically when possible, and report the status of your data transfer. Globus supports GridFTP for bulk and high-performance file transfer, and direct HTTPS for download. The service allows the user to submit a data transfer request, and performs the transfer asynchronously in the background. For more information, see Globus data transfer and Globus data sharing.

            Logging into Globus"},{"location":"data-management/acdc/eagle-data-sharing/#logging-into-globus-with-your-alcf-login","title":"Logging into Globus with your ALCF Login","text":"

            ALCF researchers can use their ALCF Login username and password to access Globus. Go to the Globus website and click on Log In in the upper right corner of the page.

            Type or scroll down to \"Argonne LCF\" in the \"Use your existing organizational login\" box, and then click \"Continue\".

            Select Organization Argonne LCF

            You will be taken to a familiar-looking page for ALCF login. Enter your ALCF login username and password.

            "},{"location":"data-management/acdc/eagle-data-sharing/#accessing-your-eagle-project-directory","title":"Accessing your Eagle Project Directory","text":"

            There are two ways for a PI to access their project directory on Eagle.

            1. Web Interface: By logging in to Globus interface directly and navigating to the ALCF Eagle endpoint.

            Note: Specifically for PIs with Eagle 'Data-only' projects and no other compute allocations, logging in from the Globus-side to get to Eagle is the only way for them to access their Eagle project directory.

            File Manager
            1. POSIX: By logging in to the ALCF systems from the terminal window.

            Note: For Eagle Data and Allocation projects, the PI will have access to the required ALCF systems (besides the Globus Web Interface) to login and access their Eagle project directory.

            Terminal Window"},{"location":"data-management/acdc/eagle-data-sharing/#creating-a-guest-collection","title":"Creating a Guest Collection","text":"

            A project PI needs to have an 'active' ALCF account in place to create and share guest collections with collaborators. Please note that ONLY a PI has the ability to create guest collections.

            • If you have an \"Inactive/Deleted\" ALCF account, please click on the account re-activation link to begin the re-activation process: Re-activation Link

            • If you DO NOT have an ALCF account, click on the account request link to begin the application process: Account Request Link

            In the Globus application in your browser:

            1. There are multiple ways to Navigate to the Collections tab in \"Endpoints\":
              1. Click the link to get started. It will take you to the Collections tab for Eagle. OR
              2. Click on 'Endpoints' located in the left panel of the Globus web app. Type \"alcf#dtn_eagle\" in the search box located at the top of the page and click the magnifying glass to search. Click on the Managed Public Endpoint \"alcf#dtn_eagle\" from the search results. Click on the Collections tab. OR
              3. Click on 'File Manager' located in the left panel of the Globus web app. Search for 'alcf#dtn_Eagle' and select it in the Collection field. Select your project directory or a sub directory that you would like to share with collaborators as a Globus guest collection. Click on 'Share' on the right side of the panel, which will take you to the Collections tab.

            Note: Shared endpoints always remain active. When you select an endpoint to transfer data to/from, you may be asked to authenticate with that endpoint. Follow the instructions on screen to activate the endpoint and to authenticate. You may also have to provide Authentication/Consent for the Globus web app to manage collections on this endpoint

            1. In the Collections tab, click 'Add a Guest Collection' located at the top right hand corner

            2. Fill out the form:

              1. If the path to the directory is not pre-populated, click the browse button, navigate and select the directory. Note that you can create a single guest collection and set permissions for folders within a guest collection. There is no reason to create multiple guest collections to share for a single project.

              2. Give the collection a Display Name (choose a descriptive name)

            3. Click \"Create Collection\"

            Create New Guest Collection"},{"location":"data-management/acdc/eagle-data-sharing/#sharing-data-with-collaborators-using-guest-collections","title":"Sharing Data with Collaborators Using Guest Collections","text":"

            If your data is on the ALCF systems, you can easily share it with collaborators who are at ALCF or elsewhere. You have full control over which files your collaborator can access, and whether they have read-only or read-write permissions.

            You can share with their institutional email. The collaborator can use the Globus web interface to download the data, or use Globus transfer to move the data to their machine.

            To share data with collaborators (that either have a Globus account or an ALCF account), click on 'Endpoints', select your newly created Guest Collection (as described in the section above), and go to the 'Permissions' tab. Click on 'Add Permissions - Share With':

            Add Permissions

            You can share with other Globus users or Globus Groups (for more information on Groups, scroll down to Groups). You can give the collaborators read, write or read+write permissions. Once the options have been selected, click 'Add Permission'.

            Add Permissions - Share With

            PI can also choose to share their data with 'Public' with anonymous read access (and anonymous write disabled). This allows anyone that has access to the data read and/or download it without authorizing the request.

            Add Permissions - Share With

            You should then see the share and the people you have shared it with. You can repeat this process for any number of collaborators. At any time, you can terminate access to the directory by clicking the trash can next to the user.

            List of people that you have shared with"},{"location":"data-management/acdc/eagle-data-sharing/#additional-information-on-globus-guest-collections","title":"Additional information on Globus Guest Collections","text":"
            • ONLY you (a project PI) can create guest collections and make them accessible to collaborators. Project Proxy (on the POSIX side) cannot create guest collections.

            • You can only share directories, not individual files.

            • Globus allows directory trees to be shared as either read or read/write. This means that any subdirectories within that tree also have the same permissions. Globus supports setting permissions at a folder level, so there is no need to create multiple guest collections for a project. You can create a guest collection at the top level and share sub-directories with the collaborators by assigning the appropriate permissions.

            • When you create a guest collection endpoint and give access to one or more Globus users, you can select whether each person has read or read/write access. If they have write access, they can also delete files within that directory tree, so you should be careful about providing write access.

            • Globus guest collections are created and managed by project PIs. If the PI of a project changes, the new PI will have to create a new guest collection and share them with the users. Globus guest collections' ownership cannot be transferred.

            • Guest collections are active as long as the project directory is available and the PI's ALCF account is active. If the account goes inactive, the collections become inaccessible to all the users. Access is restored once the PI's account is reactivated.

            • All RW actions are performed as the PI, when using Guest Collections. If a PI does not have permissions to read or write a file or a directory, then the Globus guest collection users won't either.

            "},{"location":"data-management/acdc/eagle-data-sharing/#creating-a-group","title":"Creating a group","text":"
            1. Go to Groups on the left panel
            2. Click on \u2018Create a new group\u2019 at the top
            3. Give the group a descriptive name and add Description for more information
            4. Make sure you select \u2018group members only\u2019 radio button
            5. Click on \u2018Create Group\u2019
            Create new group"},{"location":"data-management/acdc/eagle-data-sharing/#transferring-data-from-eagle","title":"Transferring data from Eagle","text":"

            Log in to Globus using your ALCF credentials. After authenticating, you will be taken to the Globus File Manager tab. In the 'Collection' box, type the name of Eagle managed endpoint (alcf#dtn_eagle). Navigate to the folder/file you want to transfer. HTTPS access (read-only) is enabled so you can download files by clicking the \"Download\" button.

            Click on 'Download' to download the required file.

            Download the required file

            To transfer files to another Globus endpoint, in the \"collection\" search box in the RHS panel, enter the destination endpoint (which could also be your Globus Connect Personal endpoint).

            Transferring files to another Globus endpoint

            To transfer files, select a file or directory on one endpoint, and click the blue 'Start' button.

            Transferring files

            If the transfer is successful, you should see the following message:

            A Successful Transfer

            Click on 'View details' to display task detail information.

            Transfer completed

            You will also receive an email when the transfer is complete.

            Email confirmation"},{"location":"data-management/acdc/eagle-data-sharing/#deleting-a-guest-collection","title":"Deleting a guest collection","text":"

            To see all guest collections you have shared, go to 'Endpoints' in the left hand navigation bar, then 'Administered by You'. Select the guest collection endpoint you wish to delete, and click on 'Delete endpoint'.

            Deleting a guest collection"},{"location":"data-management/acdc/eagle-data-sharing/#what-to-tell-your-collaborators","title":"What to tell your Collaborators","text":"

            If you set up a shared endpoint and want your collaborator to download the data, this is what you need to tell them.

            First, the collaborator needs to get a Globus account. The instructions for setting up a Globus account are as described above. This account is free. They may already have Globus access via their institution.

            If the collaborator is downloading the data to his/her personal workstation, they need to install the Globus Connect client. Globus connect clients are available for Mac, Windows or Linux systems and are free.

            If you clicked on the 'notify users via email' button when you added access for this user, they should have received a message that looks like this:

            Click on the 'notify users via email' button for collaborators to receive an email

            You can, of course, also send email to your collaborators yourself, telling them you've shared a folder with them. The collaborator should click on the link, which will require logging in with their institutional or Globus login username and password. They should then be able to see the files you shared with them. External collaborator's view of the shared collection is shown below:

            Collaborator transfer or sync to

            They should click on the files they want to transfer, then 'Transfer or Sync to', enter their own endpoint name and desired path and click the 'Start' button near the bottom to start the transfer.

            Chossing transfer path"},{"location":"data-management/acdc/eagle-data-sharing/#encryption-and-security","title":"Encryption and Security","text":"

            Data can be encrypted during Globus file transfers. In some cases encryption cannot be supported by an endpoint, and Globus Online will signal an error.

            For more information, see How does Globus Online ensure my data is secure?

            In the Transfer Files window, click on 'More options' at the bottom of the 2 panes. Check the 'encrypt transfer' checkbox in the options.

            Encrypting the transfer

            Alternatively, you can encrypt the files before transfer using any method on your local system, then transfer them using Globus, then unencrypt on the other end.

            Note: Encryption and verification will slow down the data transfer.

            "},{"location":"data-management/acdc/eagle-data-sharing/#faqs","title":"FAQs","text":""},{"location":"data-management/acdc/eagle-data-sharing/#general-faqs","title":"General FAQs:","text":"

            1. What is the Eagle File system?

            It is a Lustre file system residing on an HPE ClusterStor E1000 platform equipped with 100 Petabytes of usable capacity across 8480 disk drives. This ClusterStor platform also provides 160 Object Storage Targets and 40 Metadata Targets with an aggregate data transfer rate of 650GB/s. Primary use of Eagle is data sharing with the research community using Globus.The file system is available on all ALCF compute systems. It allows sharing of data between users (ALCF and external collaborators).

            2. What is the difference between Guest, Shared and a Mapped collection?

            • Guest collections: A Guest collection is a logical construct that a PI sets up on their project directory in Globus that makes it accessible to collaborators. The PI creates a guest collection at or below their project and shares it with the Globus account holders.
            • Shared collection: A guest collection becomes a shared collection when it is shared with a user/group.
            • Mapped Collections: Mapped Collections are created by the endpoint administrators. In the case of Eagle, these are created by ALCF.

            3. Who can create Guest collections?

            ONLY a project PI (or project owner) can create guest collections and make them accessible to collaborators.

            Project Proxy (on the POSIX side) or Access Manager (on the Globus side) do not have the ability to create guest collections.

            4. Who is an Access Manager?

            Access Manager is someone who can act as a Proxy on behalf of the PI to manage the collection. The Access Manager has the ability to add users, remove users, grant or revoke read/write access privileges for those users on that particular guest collection. However, Access Managers DO NOT have permissions to create guest collections.

            5. What are Groups?

            Groups are constructs that enable multi-user data collaboration. A PI (and an Access Manager) can create new groups, add members to them and share a guest collection with a group of collaborators.

            Note Members of groups do not need to have an ALCF account.

            6. What are some of the Common Errors you see and what do they mean?

            - EndpointNotFound   -  Wrong endpoint name \n- PermissionDenied    -  If you do not have permissions to view or modify the collection on <endpoint> (refer to the appropriate section for what this error could mean)\n- ServiceUnavailable  -  If the service is down for maintenance\n
            "},{"location":"data-management/acdc/eagle-data-sharing/#pi-faqs","title":"PI FAQs:","text":"

            1. How can a PI request for a data-only, Eagle storage allocation?

            A project PI can request an allocation by filling out the Director\u2019s Discretionary Allocation Request form: Request an allocation. The allocations committee reviews the proposals and provides its decision in 1-2 weeks. To request a storage allocation on Eagle for an existing project, please email support@alcf.anl.gov with your proposal.

            2. Does a PI need to have an ALCF account to create a Globus guest collection?

            Yes. The PI needs to have an 'active' ALCF account in place to create and share guest collections with collaborators.

            • If the PI has an 'Inactive/Deleted' ALCF account, they should click on the link here to start the account re-activation process: Account re-activation link
            • If they don't have an ALCF account, they request for one: Account request link

            3. What endpoint should the PI use?

            alcf#dtn_eagle

            4. What are the actions an Eagle PI can perform?

            • Create and delete guest collections, groups
            • Create, delete and share the data with ALCF users and external collaborators
            • Specify someone as a Proxy (Access Manager) for the guest collections
            • Transfer data between the guest collection on Eagle and other Globus endpoints/collections

            5. How can a PI specify someone as a Proxy on the Globus side?

            Go to alcf#dtn_eagle -> collections -> shared collection -> roles -> select 'Access Manager'

            To specify someone as a Proxy, click on \"Roles\"

            Choose Access Manager and \"Add Role\"

            6. What is the high-level workflow for setting up a guest collection?

            • PI requests an Eagle allocation project
            • The ALCF Allocations Committee reviews and approves requests
            • ALCF staff sets up a project, unixgroup, and project directory (on Eagle)
            • A Globus sharing policy is created for the project with appropriate access controls
            • PI creates a guest collection for the project, using the Globus mapped collection for Eagle.

              • Note: PI needs to have an active ALCF Account and will need to log in to Globus using their ALCF credentials.
              • If PI has a Globus account, it needs to be linked to their ALCF account
            • PI adds collaborators to the guest collection. Collaborators can be ALCF users and external collaborators

            • Added with Read only or Read-Write permissions

            7. Should PI add their ALCF project members to Eagle separately to access guest collections?

            ALCF project members already have access to the project directory that they can access by browsing the endpoint alcf#dtn_eagle. Globus guest collections allows sharing of data with collaborators that don't have ALCF accounts.

            8. Who has the permissions to create a guest collection?

            Only the PI has the ability to create a guest collection. The Access Manager, along with the PI, has permissions to share it with collaborators (R-only or R-W permissions as needed).

            9. I am the project PI. Why do I see a \"Permission Denied\" error when I try to CREATE a shared collection?

            If you are a PI and you see this error, it could mean that a sharing policy might not have been created by ALCF. Please contact support@alcf.anl.gov for assistance.

            10. If a PI added someone as a project proxy on the POSIX-side, is it safe to assume that the Proxy can create guest collections?

            No, project proxies cannot create guest collections, only the PI can.

            11. Who can create groups? A PI (and an Access Manager) can create new groups, add members to them and share a guest collection with a group of collaborators. For more information, refer to: Creating a Group

            12. What happens when the PI of a project changes? What happens to the shared collection endpoint?

            The new PI will need to create new shared collections and share it with collaborators again.

            13. I notice that I am the owner of all the files that were transferred by external collaborators using the guest collection. Why is that?

            When collaborators read files from or write files to the guest collection, they do so on behalf of the PI. All writes show up as having been carried by the PI. Also, if the PI does not have permission to read or write to a file or folder in the directory, then the collaborators will not have those permissions either.

            14. What happens to the guest collections when the PI's account goes inactive?

            The collections will also become inactive until the PI's account is re-activated.

            15. How long does it take for the endpoint to become accessible to collaborators after PI's account is activated?

            Right away. The page needs to be refreshed and sometimes you may have to log out and log back in.

            "},{"location":"data-management/acdc/eagle-data-sharing/#access-manager-faqs","title":"Access Manager FAQs:","text":"

            1. What are the actions an Access Manager can perform?

            1. Access Manager should be able to see the collection under \"Shared with you\" and \"Shareable by you\" tabs.\n2. Has permissions to add and/or delete collaborators on the shared collection and restrict their R-W access as needed.\n

            2. Does an Access Manager need to have an ALCF account?

            Not necessary. However, if they need to manage the membership on the POSIX side, they will need an ALCF account and be a Proxy on the project.

            3. What is the difference between an ALCF project Proxy and a guest collection Access Manager?

            ALCF Project Proxy has permissions to manage project membership on the POSIX side whereas guest collection Access Manager has permissions to manage the project membership specific to that guest collection shared by the PI on the Globus side.

            4. I am an 'Access Manager' on the collection. Why do I see a 'Permission Denied' error when I try to SHARE a guest collection created by the PI?

            If you are a non-PI who is able to access the guest collection but unable to share it, it means that your role on this guest collection is limited to a \"Member\". If you want the ability to share folders and sub-folders from the collections that are shared with you, please talk to the PI. They will need to set your role to an \"Access Manager\" for the collection within Globus

            5. Can an Access Manager give external collaborators access to the collections that are shared with them on Eagle?

            Yes, an Access Manager will see \"Permissions\" tab at the top of the shared collection page and can share it with collaborators and/or a group.

            6. Can an Access Manager create collections using the shared endpoint?

            No. An access manager cannot create a collection, only a PI can do that. The access manager can however share folders and sub-folders from the collections that are shared with them.

            7. Can an Access Manager leave a globus group or withdraw membership request for collaborators?

            Yes.[Go to alcf#dtn_eagle-> Groups > group_name -> Members -> click on specific user -> Role & Status -> Set the appropriate status]

            If you get thie error, you do not have read permissions.

            8. Can an Access Manager delete guest collections created by PI? No. Access managers cannot delete guest collections.

            Guest Collection Collaborators:

            1. What actions can collaborators perform? 1. Collaborators can read files from a collection * 2. Collaborators can write to a collection ** 3. Collaborators can delete files in a collection **

            ** If the PI has read permissions for those files on the POSIX side and the collaborator is given read permissions in Globus for the guest collection.

            ** If the PI has write permissions for those files on the POSIX side and the collaborator is given write permissions in Globus for the guest collection.

            2. I am a collaborator. Why do I see a 'Permission Denied' error when I try to ACCESS a guest collection created by the PI? If you are a non-PI and you see this error while trying to access the collection, it means that you do not have read permissions to access the quest collection. Please contact the PI for required access.

            If you get thie error, you do not have read permissions."},{"location":"data-management/acdc/transferring-data-to-eagle/","title":"Transferring Data to Eagle","text":""},{"location":"data-management/acdc/transferring-data-to-eagle/#evolution-of-the-petrel-data-service-to-the-alcf-community-data-co-op","title":"Evolution of the Petrel Data Service to the ALCF Community Data Co-Op","text":"

            The Petrel data service is evolving into a more mature service called the ALCF Community Data Co-Op (ACDC) which will be launched later this year.

            In preparation for this shift, all current Petrel project PIs will need to move their project data to ALCF's Eagle filesystem by December 2021.

            For detailed instructions on how to move your data, please follow the steps outlined below. You will need to follow the order of the steps as listed.

            If you have any questions, please email: support@alcf.anl.gov.

            "},{"location":"data-management/acdc/transferring-data-to-eagle/#transferring-data-to-eagle_1","title":"Transferring data to Eagle","text":""},{"location":"data-management/acdc/transferring-data-to-eagle/#1-request-a-dd-project-on-eagle-filesystem","title":"1. Request a DD project on Eagle Filesystem","text":"

            All Petrel project owners/PIs should request for a Director's Discretionary project on the Eagle filesystem by filling out the form at https://accounts.alcf.anl.gov/allocationRequests. Select \"New Project\" and then \"Eagle\" as the resource and fill out the rest of the form. In the \"Project and Justification Summary\" section, along with the requested details you should also state that you are migrating your data from Petrel.

            Once the submission is reviewed and approved by the allocations committee, your project will be created on the Eagle filesystem and you will be notified via email. The approval process may take 1-2 weeks. Once the project is approved, proceed to the next step.

            "},{"location":"data-management/acdc/transferring-data-to-eagle/#2-apply-for-an-alcf-account","title":"2. Apply for an ALCF account","text":"

            A project PI will need an active ALCF account to: - Transfer their data from Petrel to the Eagle filesystem - Enable data sharing on their Eagle project (See section \"4 Share your data on Eagle using Globus Guest Collections\" for more details)

            NOTE: A collaborator does not need an ALCF account to access data that is shared on Eagle (as a Globus Guest Collection). They can sign into Globus with their institutional identity to access the data. The first time they log in, they will need to accept terms and conditions.

            "},{"location":"data-management/acdc/transferring-data-to-eagle/#to-apply-for-an-alcf-account","title":"To apply for an ALCF account:","text":"
            • Visit https://accounts.alcf.anl.gov and click on \"Request An Account\".
            • When prompted for project name, please select the project on Eagle that was created for your Petrel data as a result of Step 1: Request a DD project on Eagle (you have to wait for your project to be created before you can apply for an account)
            • If you don't have one, please follow the directions under \"Step 1: Request a DD project on Eagle\" (above)
            • For more details on the ALCF account request process, visit the webpage Request an account
            • Once your account is created and you have the cryptocard/mobile token to login to Eagle, proceed to the next step to transfer the data from Petrel to Eagle
            "},{"location":"data-management/acdc/transferring-data-to-eagle/#3-transfer-data-from-your-source-endpoint-to-eagle-using-globus","title":"3. Transfer data from your source endpoint to Eagle using Globus","text":"

            You can use the Globus web app to transfer data or the CLI. See Using CLI for instructions on how to use the CLI to transfer data. The following set of instructions use the Globus web app, using alcf#dtn_eagle (path /projectname) as the destination to transfer data from your source endpoint.

            NOTE: Anonymous HTTPS read access is enabled on Eagle.

            Step 1: Log into https://app.globus.org/file-manager?destination_id=05d2c76a-e867-4f67-aa57-76edeb0beda0 which opens two panes in the Globus File Manager, with ALCF Eagle on the right-hand side. - Enter the name of your source endpoint in the pane on the left-hand side.

            File Manager

            Enter the name of your source endpoint

            Step 2: You may have to log in and link your ALCF identity to your Globus account.

            Log in and link your ALCF identity to your Globus account

            Step 3: Log in using your ALCF credentials.

            Use ALCF credentials

            Step 4: If the login is successful, the folders and files on the Eagle file system will be displayed in the project/file viewer.

            Eagle file system in the project/file viewer

            Step 5: Navigate to the correct destination (project folder) on the Eagle file system. Choose the files/folders to transfer in the left-hand side panel (Petrel endpoint).

            NOTE: Before clicking the \"Start\" button, click on the Transfer and Sync Options and check the \"sync\" checkbox and then click start.

            Choose the files/folders

            Step 6: Click on the \"Activity\" tab on the left-hand side navigation panel to view the status and details of your transfers.

            Activity tab

            Step 7: Once the transfer is successful, you should see the files and folders on the Eagle file system. You will also receive an email notification from Globus letting you know that your transfer was successful.

            Files and folders on the Eagle file system"},{"location":"data-management/acdc/transferring-data-to-eagle/#migrating-permissions-from-petrel-to-eagle","title":"Migrating permissions from Petrel to Eagle:","text":"

            For PIs who had previously stored data on Petrel, and are migrating to Eagle, the following tool automates the step of copying the permissions set on Petrel to Eagle. The tool, migrate_permissions.py at https://github.com/globus/globus-tool-examples takes the source endpoint (your shared endpoint on Petrel in this case), and destination endpoint (the guest collection on Eagle that has the data), and copies over all the permissions. The tool assumes the data was coped over as is from source to destination.

            If you have any questions on the tool, or need further support, please contact support@globus.org.

            "},{"location":"data-management/acdc/transferring-data-to-eagle/#4-share-your-data-on-eagle-using-globus-guest-collections","title":"4. Share your data on Eagle using Globus Guest Collections","text":"

            Your data on the Eagle file system can easily be shared with collaborators who are at ALCF or elsewhere. You have full control over which files your collaborator can access, and whether they have read-only or read-write permissions.

            See below for step-by-step instructions on how to share data from Eagle using Globus Guest Collections:

            https://docs.alcf.anl.gov/data-management/acdc/eagle-data-sharing/

            NOTE: Guest Collections are tied to the project PI's account so if the PI's account becomes inactive, the Guest Collections will also become inactive. Once the PI's account is reactivated, access to the Guest Collections is restored.

            "},{"location":"data-management/acdc/transferring-data-to-eagle/#using-globus-cli-tool","title":"Using Globus CLI tool:","text":"

            To copy data and permissions from a source collection, PIs can use a Globus CLI tool that automates the step of copying the permissions set on the source collection and applies them to the collection on Eagle. This is especially useful for PIs who had previously stored data on Petrel. See https://github.com/globus/globus-tool-examples for more information.

            The tool, migrate_permissions.py in the github repo takes the source endpoint (the shared endpoint on Petrel for example), and destination endpoint (the guest collection on Eagle that has the data), and copies over all the permissions. The tool assumes the data was coped over as is from source to destination. Note that you need to have a guest collection set up for your project on Eagle to use the CLI command and tool. See this page for instructions on how to set up guest collections.

            If you have any questions on the tool, or need further support, please contact support@globus.org.

            Existing data portals: To reconfigure and update your existing data portals to point to your guest collections on Eagle, please work directly with developer/maintainer of the portal.

            "},{"location":"data-management/acdc/transferring-data-to-eagle/#faqs-for-migrating-petrel-data-to-eagle","title":"FAQs for migrating Petrel data to Eagle:","text":""},{"location":"data-management/acdc/transferring-data-to-eagle/#1-is-it-important-for-a-petrel-project-ownerpi-to-obtain-an-alcf-account","title":"1. Is it important for a Petrel project owner/PI to obtain an ALCF account?","text":"

            Yes, the data from Petrel needs to be moved to an ALCF project directory on the Eagle filesystem. The PI will need an ALCF account to log into Globus and move the data to their Eagle project directory.

            "},{"location":"data-management/acdc/transferring-data-to-eagle/#2-what-is-the-workflow-for-migrating-data-from-petrel-and-giving-access-to-collaborators-on-eagle","title":"2. What is the workflow for migrating data from Petrel and giving access to collaborators on Eagle?","text":"
            1. PI requests an Eagle allocation project
            2. Allocations Committee reviews and approves requests
            3. Once the allocation request is approved, the project is created and associated with a UNIX group and project directory on Eagle
            4. PI requests an ALCF account (if they don't have one)
            5. Once the ALCF account is created and tied to the project on Eagle, the PI moves the data from Petrel to Eagle using Globus
            6. PI creates guest collections for the project on Eagle, using the Globus web app using the mapped collection/endpoint for Eagle (alcf#dtn_eagle). Note that:
            7. The PI needs to have an active ALCF Account and will need to log in to Globus using their ALCF credentials
            8. Only the PI (and not a proxy) can create guest collections
            9. If the PI already has a Globus account, it needs to be linked to their ALCF account
            10. PI adds collaborators to the guest collection.
            11. Added with read-only or read-write permissions.
            12. Note: Anonymous HTTPS write is disabled and only anonymous HTTPS read is allowed.
            13. Existing data portals on Petrel should be updated to point to the new guest collection on Eagle. Please work directly with developer/maintainer of the portal.
            "},{"location":"data-management/acdc/transferring-data-to-eagle/#3-what-endpoints-should-the-pi-use-to-move-data-from-petrel","title":"3. What endpoints should the PI use to move data from Petrel?**","text":"
            • Source: Globus endpoint on Petrel for the Petrel allocation
            • Destination: Globus endpoint on the Eagle filesystem and the path to the directory (alcf#dtn_eagle, path /) OR the name of the guest collection on Eagle"},{"location":"data-management/data-transfer/sftp-scp/","title":"SFTP and SCP","text":"

              These standard utilities are available for local area transfers of small files; they are not recommended for use with large data transfers due to poor performance and excess resource utilization on the login nodes.

              See Globus for performing large data transfers.

              "},{"location":"data-management/data-transfer/using-globus/","title":"Using Globus","text":"

              Globus addresses the challenges faced by researchers in moving, sharing, and archiving large volumes of data among distributed sites. With Globus, you hand off data movement tasks to a hosted service that manages the entire operation. It monitors performance and errors, retries failed transfers, corrects problems automatically whenever possible, and reports status to keep you informed and keep you focused on your research.

              Command line and Web-based interfaces are available. The command line interface, which requires only ssh to be installed on the client, is the method of choice for script-based workflows. Globus also provides a REST-style transfer API for advanced-use cases that require scripting and automation.

              "},{"location":"data-management/data-transfer/using-globus/#getting-started","title":"Getting Started","text":"

              Basic documentation for getting started with Globus can be found at the following URL: https://docs.globus.org/how-to/

              "},{"location":"data-management/data-transfer/using-globus/#data-transfer-node","title":"Data Transfer Node","text":"

              A total of 13 data transfer nodes (DTNs) for /home, theta-fs0, and Grand (6 of these DTNs are also used for HPSS) and 4 DTNs for Eagle are available to ALCF users, allowing users to perform wide and local area data transfers. Access to the DTNs is provided via the following Globus endpoints:

              "},{"location":"data-management/data-transfer/using-globus/#alcf-globus-endpoints","title":"ALCF Globus Endpoints","text":"

              The Globus endpoint and the path to use depends on where your data resides. If your data is on:

              • /home which is where your home directory resides : alcf#dtn_home for accessing /home (i.e. home directories on swift-home filesystem). Use the path /<username>
              • theta-fs0 filesystem: alcf#dtn_theta-fs0 for accessing /lus/theta-fs0 (i.e. project directories on Theta-fs0 filesystem). Use the path /<project name>
              • HPSS: alcf#dtn_hpss
              • Grand filesystem: alcf#dtn_grand for accessing /lus/grand/projects or /grand (.i.e. project directories on Grand filesystem). Use the path /grand/<project name>
              • Eagle filesystem: alcf#dtn_eagle for accessing /lus/eagle/projects or /eagle (.i.e project directories on Eagle filesystem). Use the path /eagle/<project name>

              After registering, simply use the appropriate ALCF endpoint, as well as other sources or destinations. Use your ALCF credentials (your OTP generated by the CryptoCARD token with PIN or Mobilepass app) to activate the ALCF endpoint.

              Globus Connect Personal allows users to add laptops or desktops as an endpoint to Globus, in just a few steps. After you set up Globus Connect Personal, Globus can be used to transfer files to and from your computer.

              "},{"location":"data-management/data-transfer/using-globus/#references","title":"References","text":"

              Research Data Management with Globus

              "},{"location":"data-management/filesystem-and-storage/data-storage/","title":"ALCF Data Storage","text":""},{"location":"data-management/filesystem-and-storage/data-storage/#disk-storage","title":"Disk Storage","text":"

              The ALCF operates a number of file systems that are mounted globally across all of our production systems.

              "},{"location":"data-management/filesystem-and-storage/data-storage/#home","title":"Home","text":"

              A Lustre file system residing on a DDN AI-400X NVMe Flash platform. It has 24 NVMe drives with 7 TB each with 123 TB of usable space. It provides 8 Object Storage Targets and 4 Metadata Targets.

              "},{"location":"data-management/filesystem-and-storage/data-storage/#grand","title":"Grand","text":"

              A Lustre file system residing on an HPE ClusterStor E1000 platform equipped with 100 Petabytes of usable capacity across 8480 disk drives. This ClusterStor platform provides 160 Object Storage Targets and 40 Metadata Targets with an aggregate data transfer rate of 650GB/s. The primary use of grand is compute campaign storage.

              Also see ALCF Data Policies and Data Transfer

              "},{"location":"data-management/filesystem-and-storage/data-storage/#eagle","title":"Eagle","text":"

              A Lustre file system residing on an HPE ClusterStor E1000 platform equipped with 100 Petabytes of usable capacity across 8480 disk drives. This ClusterStor platform provides 160 Object Storage Targets and 40 Metadata Targets with an aggregate data transfer rate of 650GB/s. The primary use of eagle is data sharing with the research community. Eagle has community sharing community capabilities which allow PIs to share their project data with external collabortors using Globus. Eagle can also be used for compute campaign storage.

              Also see ALCF Data Policies and Data Transfer

              "},{"location":"data-management/filesystem-and-storage/data-storage/#theta-fs0","title":"theta-fs0","text":"

              A Lustre file system residing on an HPE Sonexion 3000 storage array with a usable capacity of 9.2PB and an aggregate data transfer rate of 240GB/s. This is a legacy file system. No new allocations are granted on theta-fs0.

              Also see ALCF Data Policies and Data Transfer

              "},{"location":"data-management/filesystem-and-storage/data-storage/#theta-fs1","title":"theta-fs1","text":"

              A GPFS file system that resides on an IBM Elastic Storage System (ESS) cluster with a usable capacity of 7.9PB and an aggregate data transfer rate of 400GB/s. This is a legacy file system. No new allocations are granted on theta-fs1.

              Also see ALCF Data Policies and Data Transfer

              "},{"location":"data-management/filesystem-and-storage/data-storage/#tape-storage","title":"Tape Storage","text":"

              ALCF operates three 10,000 slot Spectralogic tape libraries. We are currently running a combination of LTO6 and LTO8 tape technology. The LTO tape drives have built-in hardware compression which typically achieve compression ratios between 1.25:1 and 2:1 depending on the data yielding an effective capacity of approximately 65PB.

              "},{"location":"data-management/filesystem-and-storage/data-storage/#hpss","title":"HPSS","text":"

              HPSS is a data archive and retrieval system that manages large amounts of data on disk and robotic tape libraries. It provides hierarchical storage management services that allow it to migrate data between those storage platforms.

              HPSS is currently configured with a disk and tape tier. The disk tier has a capacity of 1.2PB on a DataDirect Networks SFA12K-40 storage array. By default, all archived data is initially written to the disk tier. The tape tier consists of 3 SpectraLogic T950 robotic tape libraries containing a total of 72 LTO6 tape drives with total uncompressed capacity 64 PB. Archived data is migrated to the tape tier at regular intervals, then deleted from the disk tier to create space for future archives.

              Access to HPSS is provided by various client components. Currently, ALCF supports access through two command-line clients, HSI and HTAR. These are installed on the login nodes of Theta and Cooley. In order for the client to authenticate with HPSS, the user must have a keytab file that should be located in their home directory under subdirectory .hpss. The file name will be in the format .ktb_."},{"location":"data-management/filesystem-and-storage/data-storage/#hsi-general-usage","title":"HSI General Usage","text":"

              Before you can use HSI on XC40 systems such as Theta, you must load a module:

              module load hsi

              HSI can be invoked by simply entering hsi at your normal shell prompt. Once authenticated, you will enter the hsi command shell environment:

              > hsi\n[HSI]/home/username->\n

              You may enter \"help\" to display a brief description of available commands.

              If archiving from or retrieving to grand or eagle you must disable the Transfer Agent. -T off

              Example archive

              [HSI]/home/username-> put mydatafile                # same name on HPSS\n[HSI]/home/username-> put local.file : hpss.file    # different name on HPSS\n[HSI]/home/username-> put -T off mydatafile\n

              Example retrieval

              [HSI]/home/username-> get mydatafile\n[HSI]/home/username-> get local.file : hpss.file\n[HSI]/home/username-> get -T off mydatafile\n

              Most of the usual shell commands will work as expected in the HSI command environment. For example, checking what files are archived:

              [HSI]/home/username-> ls -l

              And organizing your archived files:

              [HSI]/home/username-> mkdir dataset1\n[HSI]/home/username-> mv hpss.file dataset1\n[HSI]/home/username-> ls dataset1\n[HSI]/home/username-> rm dataset1/hpss.file\n

              It may be necessary to use single or double quotes around metacharacters to avoid having the shell prematurely expand them. For example:

              [HSI]/home/username-> get *.c\n

              will not work, but

              [HSI]/home/username-> get \"*.c\"\n

              will retrieve all files ending in .c.

              Following normal shell conventions, other special characters in filenames such as whitespace and semicolon also need to be escaped with \"\\\" (backslash). For example:

                     [HSI]/home/username-> get \"data\\ file\\ \\;\\ version\\ 1\"\n

              retrieves the file named \"data file ; version 1\".

              HSI can also be run as a command line or embedded in a script as follows:

              hsi -O log.file \"put local.file\"\n
              "},{"location":"data-management/filesystem-and-storage/data-storage/#htar-general-usage","title":"HTAR General Usage","text":"

              HTAR is a tar-like utility that creates tar-format archive files directly in HPSS. It can be run as a command line or embedded in a script.

              Example archive

              htar -cf hpssfile.tar localfile1 localfile2 localfile3\n

              Example retrieval

              htar -xf hpssfile.tar localfile2\n

              NOTE: On Theta you must first load the HSI module to make HSI and HTAR available. \"module load hsi\" NOTE: The current version of HTAR has a 64GB file size limit as well as a path length limit. The recommended client is HSI.

              "},{"location":"data-management/filesystem-and-storage/data-storage/#globus","title":"Globus","text":"

              In addition, HPSS is accessible through the Globus endpoint alcf#dtn_hpss. As with HSI and HTAR, you must have a keytab file before using this endpoint. For more information on using Globus, please see [Using Globus].

              "},{"location":"data-management/filesystem-and-storage/data-storage/#keytab-file-missing","title":"Keytab File Missing","text":"

              If you see an error like this:

              *** HSI: (KEYTAB auth method) - keytab file missing or inaccessible: /\n home/username/.hpss/.ktb_username\n Error - authentication/initialization failed\n

              it means that your account is not enabled to use the HPSS yet. Please contact support to have it set up.

              "},{"location":"data-management/filesystem-and-storage/disk-quota/","title":"Disk Quota","text":""},{"location":"data-management/filesystem-and-storage/disk-quota/#overview","title":"Overview","text":"

              Disk quotas are enabled on project directories. ALCF's HPC systems use the swift-home file system located at /lus/swift/home where quotas are also enforced. Theta has three project file systems available to user. Details on the home file system are listed in file systems. Following are descriptions and examples for the home file system, as well as the theta-fs0, grand and eagle project filesystems.

              "},{"location":"data-management/filesystem-and-storage/disk-quota/#home-directory-quotas","title":"Home Directory Quotas","text":"

              By default, each home directory is assigned a default of 50GB. File ownership determines disk space usage.

              To check the home directory usage, enter this command:

              > myquota\nName                           Type     Filesystem        Used               Quota          Grace\n=========================================================================================================\nuserX                         User     /lus/swift         44.13G          50.00G             none\n

              "},{"location":"data-management/filesystem-and-storage/disk-quota/#project-directory-quotas","title":"Project Directory Quotas","text":"

              The Grand, Eagle, and Lustre project file system (/lus/theta-fs0) support project quotas. The amount of data stored under /lus//projects/PROJECT_NAME cannot exceed the approved project quota limit approved during the allocation period. The total data usage under the project directory is used to calculate the disk quota.

              To check project quota usage on the file systems, enter this command:

              > myprojectquotas\n\nLustre : Current Project Quota information for projects you're a member of:\n\nName                       Type        Filesystem          Used             Quota           Grace\n==============================================================================================================\nprojectX                  Project      theta-fs0            354.4T             700T            -\nprojectY                  Project      theta-fs0            916k                 1T            -\nprojectZ                  Project      grand                  8k              1000T            -\nprojectX                  Project      eagle                1.87T             1000T            -\n

              "},{"location":"data-management/filesystem-and-storage/disk-quota/#requesting-a-new-eagle-allocation","title":"Requesting a New Eagle Allocation","text":"

              For requesting a new project having an allocation on Eagle (with or without a compute allocation), please make a request by filling out the Director's Discretionary allocation form. Note that all new compute projects will have Grand as the default file system.

              "},{"location":"data-management/filesystem-and-storage/disk-quota/#quota-increases","title":"Quota Increases","text":"

              If you need a quota increase for Director's Discretionary allocations, please make a request by filling out the Director's Discretionary allocation form.

              If you need a quota increase for your INCITE/ALCC/ALCC/ESP project directory, please send an email to support@alcf.anl.gov with the machine, project name, new quota amount and reason for the increase.

              "},{"location":"data-management/filesystem-and-storage/file-systems/","title":"ALCF File Systems","text":"

              Our HPC systems have three discrete file systems for project data: theta-fs0, Grand, and Eagle. Theta-fs0 is an Intel Enterprise Edition Lustre parallel file system mounted as /lus-projects or /projects. Grand and Eagle are 100 PB Lustre file systems mounted as /grand and /eagle respectively. For more information on the Lustre file system, here is a document on Lustre File Striping Basics.

              • Lustre File Striping Basics

              For information on the AI Testbed storage systems, refer to the AI Testbed storage page: https://argonne-lcf.github.io/ai-testbed-userdocs/common/storage/

              Our HPC systems also share a Lustre home file system, called swift-home. The home file system is mounted as /home, and should generally be used for small files and any binaries to be run on Theta. The performance of this file system is reasonable, but using it for intensive I/O from the compute nodes is discouraged because I/O from the compute nodes uses the project data file systems, which are fast parallel systems and have far more storage space and greater I/O performance than the home directory space.

              The swift-home file system is regularly backed up to tape. The data file system is not backed up. It is the user\u2019s responsibility to ensure that copies of any critical data on the data file system have either been archived to tape or stored elsewhere.

              Name Accessible From Type Path Production Backed-up Usage swift-home Theta ThetaGPU Cooley Polaris Lustre /home or /lus/swift/home Yes Yes General use lus-projects (theta-fs0) Theta ThetaGPU Cooley Lustre /projects or /lus-projects or /lus/theta-fs0/projects Yes No Intensive job output, large files Grand Theta ThetaGPU Cooley Polaris Lustre /grand or /lus/grand/projects Yes No Intensive job output, large files Eagle Theta ThetaGPU Cooley Polaris Lustre /eagle or /lus/eagle/projects Yes No Community sharing via Globus; Intensive job output, large files Node SSD (Compute node only) Theta ThetaGPU Polaris xfs /local/scratch (Theta) /local/scratch (Polaris) /raid/scratch (ThetaGPU) Yes Theta & ThetaGPU by request only No Local node scratch during run"},{"location":"data-management/filesystem-and-storage/file-systems/#available-directories","title":"Available Directories","text":""},{"location":"data-management/filesystem-and-storage/file-systems/#home-directories","title":"Home Directories","text":"
              • Created when an account is created
              • Located under /home
              • Each home directory is subject to a quota based on user file ownership. The default quota is 50 GB
              "},{"location":"data-management/filesystem-and-storage/file-systems/#sharing-home-directory-files-or-subdirectories-with-others","title":"Sharing Home Directory Files or Subdirectories with Others","text":"

              If you need to share files or subdirectories (folders) under your home directory with collaborators (other ALCF users), you need to change file permissions from their defaults. You must change permissions of your top-level /home/username directory, even if you only want to share certain files/directories within it. Using normal linux file permissions control is good enough to give access to all other users, and is simple. For more fine-grained control over specific users, you need to use linux access control list (ACL) commands.

              "},{"location":"data-management/filesystem-and-storage/file-systems/#simple-method-permission-to-all-users","title":"Simple Method: Permission to All Users","text":"

              First, a one-time-only change to your top-level /home/username directory.

              chmod o+x /home/username\n

              Then you may permission individual files and/or subdirectories with read access. For example, to recursively change permissions on /home/username/subdirectoryname so that all files in that subdirectory and any subdirectory trees within it are world-readable, you would use

              chmod -R o+Xr /home/username/subdirectoryname\n
              "},{"location":"data-management/filesystem-and-storage/file-systems/#refined-method-use-acl-to-give-permission-to-specific-users","title":"Refined Method: Use ACL to Give Permission to Specific Users","text":"

              First, a one-time-only change to your top-level /home/username directory. To share files/directories with user gilgamesh, for example:

              setfacl u:gilgamesh:--x /home/username\n

              Then you may permission individual files and/or subdirectories with read access. For example, to recursively change permissions on /home/username/subdirectoryname so that all files in that subdirectory and any subdirectory trees within it are readable to user gilgamesh, you would use

              setfacl -R -u gilgamesh:m:r-X,d:u:gilgamesh:r-X /home/username/subdirectoryname\n
              "},{"location":"data-management/filesystem-and-storage/file-systems/#project-directories","title":"Project Directories","text":"
              • Directories on Grand or Eagle are created when an allocation (INCITE, ALCC, Discretionary, etc.) is awarded. Eagle directories can be created as stand-alone allocations. Use the allocation request form to submit requests for an allocation on Eagle. Note that project directories are no longer created on theta-fs0.
              • Directory paths:
                • theta-fs0: /projects or /lus-projects or /lus/theta-fs0/projects
                • Grand: /grand or /lus/grand/projects
                • Eagle: /eagle or /lus/eagle/projects

              These project spaces do not have user quotas but a directory quota, meaning that ALL files contained within a project directory, regardless of the username, cannot exceed the disk space allocation granted to the project. For more information on quotas, see the Disk Quota page.

              "},{"location":"data-management/filesystem-and-storage/file-systems/#local-node-ssd","title":"Local Node SSD","text":"

              Access to SSDs is disabled by default for Theta and ThetaGPU. Project PIs may request access by emailing support@alcf.anl.gov. A use case will need to be provided.

              Access to SSDs is enabled by default on Polaris.

              "},{"location":"data-management/filesystem-and-storage/file-systems/#ssd-information","title":"SSD Information","text":"
              • Local scratch SSD storage on compute nodes for running jobs
              • Completely local non-parallel filesystem
              • Located at /local/scratch on Theta and Polaris computes and /raid/scratch on ThetaGPU computes
              • Wiped between Cobalt/PBS Pro jobs
              • No automatic backups provided
              • Information on the current SSD drives in use is below:

              Polaris SSD Specs

              Model PM1725a drives specifications

              Model PM1725a drives ------- Capacity 1.6 TB Sequential Read 3300 MB/s Sequential Write 3300 MB/s

              Theta and ThetaGPU SSD Specs

              Model SM961 drives

              Model SM961 drives ------- Capacity 128 GB Sequential Read 3100 MB/s Sequential Write 700 MB/s

              Model SM951 drives specifications

              Model SM951 drives ------ Capacity 128 GB Sequential Read 2150 MB/s Sequential Write 1550 MB/s"},{"location":"data-management/filesystem-and-storage/hpss/","title":"Using HPSS","text":""},{"location":"data-management/filesystem-and-storage/hpss/#overview","title":"Overview","text":"

              HPSS is a data archive and retrieval system that manages large amounts of data on disk and robotic tape libraries. It provides hierarchical storage management services that allow it to migrate data between those storage platforms.

              HPSS is currently configured with a disk and tape tier. The disk tier has a capacity of 1.2PB on a DataDirect Networks SFA12K-40 storage array. By default, all archived data is initially written to the disk tier. The tape tier consists of 3 SpectraLogic T950 robotic tape libraries containing a total of 72 LTO6 tape drives with total uncompressed capacity 64 PB. Archived data is migrated to the tape tier at regular intervals, then deleted from the disk tier to create space for future archives.

              Access to HPSS is provided by various client components. Currently, ALCF supports access through two command-line clients: HSI and HTAR. These are installed on the login nodes of Theta, Cooley, and Polaris. In order for the client to authenticate with HPSS, the user must have a keytab file that should be located in their home directory under subdirectory .hpss. The file name will be in the format .ktb_<userid>.

              "},{"location":"data-management/filesystem-and-storage/hpss/#hsi-general-usage","title":"HSI General Usage","text":"

              HSI can be invoked by simply entering hsi at your normal shell prompt. Once authenticated, you will enter the hsi command shell environment:

              > hsi\n[HSI]/home/username->\n

              You may enter \"help\" to display a brief description of available commands.

              Example archive:

              [HSI]/home/username-> put mydatafile                # same name on HPSS\n[HSI]/home/username-> put local.file : hpss.file    # different name on HPSS\n

              Example retrieval:

              [HSI]/home/username-> get mydatafile\n[HSI]/home/username-> get local.file : hpss.file\n

              Most of the usual shell commands will work as expected in the HSI command environment.

              For example, checking what files are archived:

              [HSI]/home/username-> ls -l\n

              And organizing your archived files:

              [HSI]/home/username-> mkdir dataset1\n[HSI]/home/username-> mv hpss.file dataset1\n[HSI]/home/username-> ls dataset1\n[HSI]/home/username-> rm dataset1/hpss.file\n

              It may be necessary to use single or double quotes around metacharacters to avoid having the shell prematurely expand them.

              For example:

              [HSI]/home/username-> get *.c\n\nwill not work, but\n\n[HSI]/home/username-> get \"*.c\"\n\nwill retrieve all files ending in .c.  \n

              Following normal shell conventions, other special characters in filenames such as whitespace and semicolon also need to be escaped with \"\\\" (backslash). For example:

                 [HSI]/home/username-> get \"data\\ file\\ \\;\\ version\\ 1\"\n

              retrieves the file named \"data file ; version 1\".

              HSI can also be run as a command line or embedded in a script as follows:

              hsi -O log.file \"put local.file\"\n

              "},{"location":"data-management/filesystem-and-storage/hpss/#htar-general-usage","title":"HTAR General Usage","text":"

              HTAR is a tar-like utility that creates tar-format archive files directly in HPSS. It can be run as a command line or embedded in a script.

              Example archive:

              htar -cf hpssfile.tar localfile1 localfile2 localfile3\n

              Example retrieval:

              htar -xf hpssfile.tar localfile2\n

              Note: - On Theta you must first load the HSI module to make HSI and HTAR available. \"module load hsi\" - The current version of HTAR has a 64GB file size limit as well as a path length limit. The recommended client is HSI

              "},{"location":"data-management/filesystem-and-storage/hpss/#globus","title":"Globus","text":"

              In addition, HPSS is accessible through the Globus endpoint alcf#dtn_hpss. As with HSI and HTAR, you must have a keytab file before using this endpoint. For more information on using Globus, please see Using Globus.

              "},{"location":"data-management/filesystem-and-storage/hpss/#common-problems","title":"Common Problems","text":""},{"location":"data-management/filesystem-and-storage/hpss/#keytab-file-missing","title":"Keytab File Missing","text":"

              If you see an error like this: *** HSI: (KEYTAB auth method) - keytab file missing or inaccessible: / home/username/.hpss/.ktb_username Error - authentication/initialization failed it means that your account is not enabled to use the HPSS yet. Please contact support to have it set up.

              "},{"location":"polaris/getting-started/","title":"Getting Started on Polaris","text":""},{"location":"polaris/getting-started/#logging-into-polaris","title":"Logging Into Polaris","text":"

              To log into Polaris:

              ssh <username>@polaris.alcf.anl.gov\n
              Then, type in the password from your CRYPTOCard/MobilePASS+ token.

              "},{"location":"polaris/getting-started/#hardware-overview","title":"Hardware Overview","text":"

              An overview of the Polaris system including details on the compute node architecture is available on the Machine Overview page.

              "},{"location":"polaris/getting-started/#compiling-applications","title":"Compiling Applications","text":"

              Users are encouraged to read through the Compiling and Linking Overview page and corresponding pages depending on the target compiler and programming model.

              "},{"location":"polaris/getting-started/#submitting-and-running-jobs","title":"Submitting and Running Jobs","text":"

              Users are encouraged to read through the Running Jobs with PBS at the ALCF page for information on using the PBS scheduler and preparing job submission scripts. Some example job submission scripts are available on the Example Job Scripts page as well.

              "},{"location":"polaris/getting-started/#lustre-file-striping","title":"Lustre File Striping","text":"

              In addition to the content above, here is a document on Lustre File Striping Basics.

              • Lustre File Striping Basics
              "},{"location":"polaris/getting-started/#proxy","title":"Proxy","text":"

              If the node you are on doesn\u2019t have outbound network connectivity, add the following to your ~/.bash_profile file to access the proxy host.

              # proxy settings\nexport HTTP_PROXY=\"http://proxy-01.pub.alcf.anl.gov:3128\"\nexport HTTPS_PROXY=\"http://proxy-01.pub.alcf.anl.gov:3128\"\nexport http_proxy=\"http://proxy-01.pub.alcf.anl.gov:3128\"\nexport https_proxy=\"http://proxy-01.pub.alcf.anl.gov:3128\"\nexport ftp_proxy=\"http://proxy-01.pub.alcf.anl.gov:3128\"\nexport no_proxy=\"admin,polaris-adminvm-01,localhost,*.cm.polaris.alcf.anl.gov,polaris-*,*.polaris.alcf.anl.gov,*.alcf.anl.gov\"\n
              "},{"location":"polaris/getting-started/#getting-assistance","title":"Getting Assistance","text":"

              Please direct all questions, requests, and feedback to support@alcf.anl.gov.

              "},{"location":"polaris/known-issues/","title":"Known Issues","text":"

              This is a collection of known issues that have been encountered on Polaris. Documentation will be updated as issues are resolved. Users are encouraged to email support@alcf.anl.gov to report issues.

              "},{"location":"polaris/known-issues/#compiling-running-applications","title":"Compiling & Running Applications","text":"
              1. Since the Slingshot 11 and related software upgrade, users may encounter the following issue when running an application.
              /opt/cray/pe/gcc-libs/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by a.out)\n

              At this time, it is suggested to update the LD_PRELOAD environment variable as follows.

              export LD_PRELOAD=/opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6\n
              1. If your job fails to start with an RPC launch message like below, please forward the complete messages to support@alcf.anl.gov.
              launch failed on x3104c0s1b0n0: Couldn't forward RPC launch(ab751d77-e80a-4c54-b1c2-4e881f7e8c90) to child x3104c0s31b0n0.hsn.cm.polaris.alcf.anl.gov: Resource temporarily unavailable\n
              1. With PrgEnv-nvhpc/8.3.3, if you are using nvcc to indirectly invoke nvc++ and compiling C++17 code (as, for example, in building Kokkos via nvcc_wrapper), you will get compilation errors with C++17 constructs. See our documentation on NVIDIA Compilers for a workaround.

              2. PrgEnv-nvhpc/8.3.3 currently loads the nvhpc/21.9 module, which erroneously has the following lines:

              setenv(\"CC\",\"/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvc\")\nsetenv(\"CXX\",\"/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvc++\")\nsetenv(\"FC\",\"/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvfortran\")\nsetenv(\"F90\",\"/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvfortran\")\nsetenv(\"F77\",\"/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvfortran\")\nsetenv(\"CC\",\"cpp\")\n

              In particular, the final line can cause issues for C-based projects (e.g. CMake may complain because the cpp C preprocessor is not a compiler). We recommend running the following in such cases:

              unset CC\nunset F77\nunset CXX\nunset FC\nunset F90\n
              1. Cray MPICH may exhibit issues when MPI ranks call fork() and are distributed across multiple nodes. The process may hang or throw a segmentation fault.

                In particular, this can manifest in hangs with PyTorch+Horovod with a DataLoader with multithreaded workers and distributed data parallel training on multiple nodes. We have built a module conda/2022-09-08-hvd-nccl which includes a Horovod built without support for MPI. It uses NCCL for GPU-GPU communication and Gloo for coordination across nodes.

                export IBV_FORK_SAFE=1 may be a workaround for some manifestations of this bug; however it will incur memory registration overheads. It does not fix the hanging experienced with multithreaded dataloading in PyTorch+Horovod across multiple nodes with conda/2022-09-08, however (instead prompting a segfault).

                This incompatibility also may affect Parsl; see details in the Special notes for Polaris section of the Parsl page.

              "},{"location":"polaris/known-issues/#profiling-applications","title":"Profiling Applications","text":"
              1. The nsys profiler packaged with nvhpc/21.9 in some cases appears to be presenting broken timelines with start times not lined up. The issue does not appear to be present when nsys from cudatoolkit-standalone/11.2.2 is used. We expect this to no longer be an issue once nvhpc/22.5 is made available as the default version.
              "},{"location":"polaris/known-issues/#submitting-jobs","title":"Submitting Jobs","text":"
              1. For batch job submissions, if the parameters within your submission script do not meet the parameters of any of the execution queues (small, ..., backfill-large) you might not receive the \"Job submission\" error on the command line at all, and the job will never appear in history qstat -xu <username> (current bug in PBS). E.g. if a user submits a script to the prod routing queue requesting 10 nodes for 24 hours, exceeding \"Time Max\" of 6 hrs of the small execution queue (which handles jobs with 10-24 nodes), then it may behave as if the job was never submitted.

              2. Job scripts are copied to temporary locations after qsub and any changes to the original script while the job is queued will not be reflected in the copied script. Furthermore, qalter requires -A <allocation name> when changing job properties. Currently, there is a request for a qalter-like command to trigger a re-copy of the original script to the temporary location.

              "},{"location":"polaris/running-jobs/","title":"Running Jobs on Polaris","text":""},{"location":"polaris/running-jobs/#queues","title":"Queues","text":"

              There are five production queues you can target in your qsub (-q <queue name>):

              Queue Name Node Min Node Max Time Min Time Max Notes debug 1 2 5 min 1 hr max 8 nodes in use by this queue ay any given time debug-scaling 1 10 5 min 1 hr max 1 job running/accruing/queued per-user prod 10 496 5 min 24 hrs Routing queue; See below preemptable 1 10 5 min 72 hrs max 20 jobs running/accruing/queued per-project; see note below demand 1 56 5 min 1 hr By request only; max 100 jobs running/accruing/queued per-project

              Note: Jobs in the demand queue take priority over jobs in the preemptable queue. This means jobs in the preemptable queue may be preempted (killed without any warning) if there are jobs in the demand queue. Please use the following command to view details of a queue: qstat -Qf <queuename>

              prod is routing queue and routes your job to one of the following six execution queues:

              Queue Name Node Min Node Max Time Min Time Max Notes small 10 24 5 min 3 hrs medium 25 99 5 min 6 hrs large 100 496 5 min 24 hrs backfill-small 10 24 5 min 3 hrs low priority, negative project balance backfill-medium 25 99 5 min 6 hrs low priority, negative project balance backfill-large 100 496 5 min 24 hrs low priority, negative project balance
              • Note 1: You cannot submit to these queues directly, you can only submit to the routing queue \"prod\".
              • Note 2: All of these queues have a limit of ten (10) jobs running/accruing per-project
              • Note 3: All of these queues have a limit of one hundred (100) jobs queued (not accruing score) per-project
              • Note 4: As of January 2023, it is recommended to submit jobs with a maximum node count of 476-486 nodes given current rates of downed nodes (larger jobs may sit in the queue indefinitely).
              "},{"location":"polaris/running-jobs/#running-mpiopenmp-applications","title":"Running MPI+OpenMP Applications","text":"

              Once a submitted job is running calculations can be launched on the compute nodes using mpiexec to start an MPI application. Documentation is accessible via man mpiexec and some helpful options follow.

              • -n total number of MPI ranks
              • -ppn number of MPI ranks per node
              • --cpu-bind CPU binding for application
              • --depth number of cpus per rank (useful with --cpu-bind)
              • --env set environment variables (--env OMP_NUM_THREADS=2)
              • --hostfile indicate file with hostnames (the default is --hostfile $PBS_NODEFILE)

              A sample submission script with directives is below for a 4-node job with 32 MPI ranks on each node and 8 OpenMP threads per rank (1 per CPU).

              #!/bin/bash -l\n#PBS -N AFFINITY\n#PBS -l select=4:ncpus=256\n#PBS -l walltime=0:10:00\n#PBS -q debug-scaling\n#PBS -A Catalyst\n\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS=32 # Number of MPI ranks to spawn per node\nNDEPTH=8 # Number of hardware threads per rank (i.e. spacing between MPI ranks)\nNTHREADS=8 # Number of software threads per rank to launch (i.e. OMP_NUM_THREADS)\n\nNTOTRANKS=$(( NNODES * NRANKS ))\n\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS} THREADS_PER_RANK= ${NTHREADS}\"\n\ncd /home/knight/affinity\nmpiexec --np ${NTOTRANKS} -ppn ${NRANKS} -d ${NDEPTH} --cpu-bind depth -env OMP_NUM_THREADS=${NTHREADS} ./hello_affinity\n
              "},{"location":"polaris/running-jobs/#running-gpu-enabled-applications","title":"Running GPU-enabled Applications","text":"

              GPU-enabled applications will similarly run on the compute nodes using the above example script. - The environment variable MPICH_GPU_SUPPORT_ENABLED=1 needs to be set if your application requires MPI-GPU support whereby the MPI library sends and receives data directly from GPU buffers. In this case, it will be important to have the craype-accel-nvidia80 module loaded both when compiling your application and during runtime to correctly link against a GPU Transport Layer (GTL) MPI library. Otherwise, you'll likely see GPU_SUPPORT_ENABLED is requested, but GTL library is not linked errors during runtime. - If running on a specific GPU or subset of GPUs is desired, then the CUDA_VISIBLE_DEVICES environment variable can be used. For example, if one only wanted an application to access the first two GPUs on a node, then setting CUDA_VISIBLE_DEVICES=0,1 could be used.

              "},{"location":"polaris/running-jobs/#binding-mpi-ranks-to-gpus","title":"Binding MPI ranks to GPUs","text":"

              The Cray MPI on Polaris does not currently support binding MPI ranks to GPUs. For applications that need this support, this instead can be handled by use of a small helper script that will appropriately set CUDA_VISIBLE_DEVICES for each MPI rank. One example is available here where each MPI rank is similarly bound to a single GPU with round-robin assignment.

              A example set_affinity_gpu_polaris.sh script follows where GPUs are assigned round-robin to MPI ranks.

              #!/bin/bash -l\nnum_gpus=4\n# need to assign GPUs in reverse order due to topology\n# See Polaris Device Affinity Information:\n# https://www.alcf.anl.gov/support/user-guides/polaris/hardware-overview/machine-overview/index.html\ngpu=$((${num_gpus} - 1 - ${PMI_LOCAL_RANK} % ${num_gpus}))\nexport CUDA_VISIBLE_DEVICES=$gpu\necho \u201cRANK= ${PMI_RANK} LOCAL_RANK= ${PMI_LOCAL_RANK} gpu= ${gpu}\u201d\nexec \"$@\"\n
              This script can be placed just before the executable in the mpiexec command like so.
              mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth ./set_affinity_gpu_polaris.sh ./hello_affinity\n
              Users with different needs, such as assigning multiple GPUs per MPI rank, can modify the above script to suit their needs.

              "},{"location":"polaris/running-jobs/#interactive-jobs-on-compute-nodes","title":"Interactive Jobs on Compute Nodes","text":"

              Here is how to submit an interactive job to, for example, edit/build/test an application Polaris compute nodes:

              qsub -I -l select=1 -l filesystems=home:eagle -l walltime=1:00:00 -q debug\n

              This command requests 1 node for a period of 1 hour in the debug queue, requiring access to the /home and eagle filesystems. After waiting in the queue for a node to become available, a shell prompt on a compute node will appear. You may then start building applications and testing gpu affinity scripts on the compute node.

              NOTE: If you want to ssh or scp to one of your assigned compute nodes you will need to make sure your $HOME directory and your $HOME/.ssh directory permissions are both set to 700.

              "},{"location":"polaris/running-jobs/#running-multiple-mpi-applications-on-a-node","title":"Running Multiple MPI Applications on a node","text":"

              Multiple applications can be run simultaneously on a node by launching several mpiexec commands and backgrounding them. For performance, it will likely be necessary to ensure that each application runs on a distinct set of CPU resources and/or targets specific GPUs. One can provide a list of CPUs using the --cpu-bind option, which when combined with CUDA_VISIBLE_DEVICES provides a user with specifying exactly which CPU and GPU resources to run each application on. In the example below, four instances of the application are simultaneously running on a single node. In the first instance, the application is spawning MPI ranks 0-7 on CPUs 24-31 and using GPU 0. This mapping is based on output from the nvidia-smi topo -m command and pairs CPUs with the closest GPU.

              export CUDA_VISIBLE_DEVICES=0\nmpiexec -n 8 --ppn 8 --cpu-bind list:24:25:26:27:28:29:30:31 ./hello_affinity &\n\nexport CUDA_VISIBLE_DEVICES=1\nmpiexec -n 8 --ppn 8 --cpu-bind list:16:17:18:19:20:21:22:23 ./hello_affinity &\n\nexport CUDA_VISIBLE_DEVICES=2\nmpiexec -n 8 --ppn 8 --cpu-bind list:8:9:10:11:12:13:14:15 ./hello_affinity &\n\nexport CUDA_VISIBLE_DEVICES=3\nmpiexec -n 8 --ppn 8 --cpu-bind list:0:1:2:3:4:5:6:7 ./hello_affinity &\n\nwait\n
              "},{"location":"polaris/running-jobs/#compute-node-access-to-the-internet","title":"Compute Node Access to the Internet","text":"

              Currently, the only access the internet is via a proxy. Here are the proxy environment variables for Polaris:

              export http_proxy=\"http://proxy-01.pub.alcf.anl.gov:3128\"\nexport https_proxy=\"http://proxy-01.pub.alcf.anl.gov:3128\"\nexport ftp_proxy=\"http://proxy-01.pub.alcf.anl.gov:3128\"\n

              In the future, though we don't have a timeline on this because it depends on future features in slingshot and internal software development, we intend to have public IP addresses be a schedulable resource. For instance, if only your head node needed public access your select statement might looks something like: -l select=1:pubnet=True+63.

              "},{"location":"polaris/running-jobs/#controlling-where-your-job-runs","title":"Controlling Where Your Job Runs","text":"

              If you wish to have your job run on specific nodes form your select like this: -l select=1:vnode=<node name1>+1:vnode=<node name2>... . Obviously, that gets tedious for large jobs.

              If you want to control the location of a few nodes, for example 2 out of 64, but the rest don't matter, you can do something like this: -l select=1:vnode=<node name1>+1:vnode=<node name2>+62:system=foo

              Every node has a PBS resource called tier0 with a rack identifier and tier1 with a dragonfly group identifieer. If you want all your nodes grouped in a rack, you can add the group specifier -l select=8:system=foo,place=scatter:group=tier0. If you wanted everything in the same dragonfly group, replace tier0 with tier1. Note that you have to also explicitly specify the place when you use group. If you wanted a specific rack or dragonfly group instead of any of them, you are back to the select: -l select 10:tier0=x3001-g0.

              "},{"location":"polaris/running-jobs/#network-rack-and-dragonfly-group-mappings","title":"Network: Rack and Dragonfly Group Mappings","text":"
              • Racks contain (7) 6U chassis; each chassis has 2 nodes for 14 nodes per rack
              • The hostnames are of the form xRRPPc0sUUb[0|1]n0 where:
                • RR is the row {30, 31, 32}
                • PP is the position in the row {30 goes 1-16, 31 and 32 go 1-12}
                • c is chassis and is always 0
                • s stands for slot, but in this case is the RU in the rack and values are {1,7,13,19,25,31,37}
                • b is BMC controller and is 0 or 1 (each node has its own BMC)
                • n is node, but is always 0 since there is only one node per BMC
              • So, 16+12+12 = 40 racks * 14 nodes per rack = 560 nodes.
              • Note that in production group 9 (the last 4 racks) will be the designated on-demand racks
              • The management racks are x3000 and X3100 and are dragonfly group 10
              • The TDS rack is x3200 and is dragonfly group 11
              • Each compute node will have a PBS resource named tier0 which will be equal to the values in the table below. This allows you to group your jobs within a rack if you wish. There is also a resource called tier1 which will be equal to the column headings. This allows you to group your jobs within a dragonfly group if you wish.
              g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 x3001-g0 x3005-g1 x3009-g2 x3013-g3 x3101-g4 x3105-g5 x3109-g6 x3201-g7 x3205-g8 x3209-g9 x3002-g0 x3006-g1 x3010-g2 x3014-g3 x3102-g4 x3106-g5 x3110-g6 x3202-g7 x3206-g8 x3210-g9 x3003-g0 x3007-g1 x3011-g2 x3015-g3 x3103-g4 x3107-g5 x3111-g6 x3203-g7 x3207-g8 x3211-g9 x3004-g0 x3008-g1 x3012-g2 x3016-g3 x3104-g4 x3108-g5 x3112-g6 x3204-g7 x3208-g8 x3212-g9"},{"location":"polaris/applications-and-libraries/applications/gromacs/","title":"Gromacs on Polaris","text":""},{"location":"polaris/applications-and-libraries/applications/gromacs/#what-is-gromacs","title":"What is Gromacs?","text":"

              GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids, and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

              "},{"location":"polaris/applications-and-libraries/applications/gromacs/#using-gromacs-at-alcf","title":"Using GROMACS at ALCF","text":"

              ALCF offers assistance with building binaries and compiling instructions for GROMACS. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"polaris/applications-and-libraries/applications/gromacs/#building-gromacs","title":"Building Gromacs","text":"
              1. Download latest source code: http://manual.gromacs.org/documentation/2022.1/download.html
              2. tar -xzf gromacs-2022.1.tar.gz
              3. module swap PrgEnv-nvhpc PrgEnv-gnu
              4. module load cudatoolkit-standalone/11.2.2
              5. module load gcc/10.3.0
              6. module load cmake
              7. cd gromacs-2022.1
              8. mkdir build
              9. cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC \\\n      -DBUILD_SHARED_LIBS=OFF -DGMX_BUILD_OWN_FFTW=ON \\\n      -DCMAKE_INSTALL_PREFIX=/path-to/gromacs-2022.1/build \\\n      -DGMX_MPI=ON -DGMX_OPENMP=ON -DGMX_GPU=CUDA \\\n      -DCUDA_TOOLKIT_ROOT_DIR=/soft/compilers/cudatoolkit/cuda-11.2.2\n
              10. make \u2013j 8
              11. make install
              12. The installed binary is build/bin/gmx_mpi.
              "},{"location":"polaris/applications-and-libraries/applications/gromacs/#running-gromacs-on-polaris","title":"Running Gromacs on Polaris","text":"

              Prebuilt Gromacs binaries can be found in the directory /soft/applications/Gromacs/gromacs-2022.1.

              A sample pbs script follows that will run GROMACS on two nodes, using 4 MPI ranks per node, and each rank with four OpenMP threads. The PME kernel owns one MPI rank and one GPU per node, while the nonbonded kernel uses 3 MPI ranks and 3 GPUs per node.

              #!/bin/sh\n#PBS -l select=2:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -q debug\n#PBS -A PROJECT\n#PBS -l filesystems=home:grand:eagle\n\ncd ${PBS_O_WORKDIR}\n\nmodule swap PrgEnv-nvhpc PrgEnv-gnu\nmodule load cudatoolkit-standalone/11.2.2\n\nexport OMP_NUM_THREADS=4\n\nmpirun --np 8 /soft/applications/Gromacs/gromacs-2022.1/gmx_mpi \\\n      mdrun -gputasks 0123 -nb gpu -pme gpu -npme 1 -ntomp 4 \\\n      -dlb yes -resethway -pin on -v deffnm step5_1 -g test.log\n

              We strongly suggest that users try combinations of different numbers of nodes, MPI ranks per node, number of GPU tasks/devices, GPU task decomposition between nonbonded and PME kernels, and OMP threads per rank to find the optimal throughput for their particular workload.

              "},{"location":"polaris/applications-and-libraries/applications/lammps/","title":"LAMMPS","text":""},{"location":"polaris/applications-and-libraries/applications/lammps/#overview","title":"Overview","text":"

              LAMMPS is a general-purpose molecular dynamics software package for massively parallel computers. It is written in an exceptionally clean style that makes it one of the more popular codes for users to extend and it currently has dozens of user-developed extensions.

              For details bout the code and its usage, see the LAMMPS home page. This page provides information specific to running on Polaris at the ALCF.

              "},{"location":"polaris/applications-and-libraries/applications/lammps/#using-lammps-at-alcf","title":"Using LAMMPS at ALCF","text":"

              ALCF provides assistanc with build instructions, compiling executables, submitting jobs, and providing prebuilt binaries (upon request). A collection of Makefiles and submission scripts are available in the ALCF GettingStarted repo here. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"polaris/applications-and-libraries/applications/lammps/#how-to-obtain-the-code","title":"How to Obtain the Code","text":"

              LAMMPS is an open-source code, which can be downloaded from the LAMMPS website.

              "},{"location":"polaris/applications-and-libraries/applications/lammps/#building-on-polaris-using-kokkos-package","title":"Building on Polaris using KOKKOS package","text":"

              After LAMMPS has been downloaded and unpacked an ALCF filesystem, users should see a directory whose name is of the form lammps-<version>. One should then see the Makefile lammps-<version>/src/MAKE/MACHINES/Makefile.polaris in recent versions that can be used for compilation on Polaris. A copy of the Makefile is also available in the ALCF GettingStarted repo here. For older versions of LAMMPS, you may need to take an existing Makefile (e.g. Makefile.mpi) for your specific version of LAMMPS used and edit the top portion appropratiately to create a new Makefile.polaris files.

              The top portion of Makefile.polaris_kokkos_nvidia used to build LAMMPS with the KOKKOS package using the NVIDIA compilers is shown as an example.

              # polaris_nvidia = Flags for NVIDIA A100, NVIDIA Compiler, Cray MPICH, CUDA\n# module load craype-accel-nvidia80\n# make polaris_kokkos_nvidia -j 16\n\nSHELL = /bin/sh\n\n# ---------------------------------------------------------------------\n# compiler/linker settings\n# specify flags and libraries needed for your compiler\n\nKOKKOS_DEVICES = Cuda,OpenMP\nKOKKOS_ARCH = Ampere80\nKOKKOS_ABSOLUTE_PATH = $(shell cd $(KOKKOS_PATH); pwd)\nexport NVCC_WRAPPER_DEFAULT_COMPILER = nvc++\n\nCRAY_INC = $(shell CC --cray-print-opts=cflags)\nCRAY_LIB = $(shell CC --cray-print-opts=libs)\n\nCC =        $(KOKKOS_ABSOLUTE_PATH)/bin/nvcc_wrapper\nCCFLAGS =  -g -O3 -mp -DLAMMPS_MEMALIGN=64 -DLAMMPS_BIGBIG\nCCFLAGS += $(CRAY_INC)\nSHFLAGS =   -fPIC\nDEPFLAGS =  -M\n\nLINK =      $(CC)\nLINKFLAGS = $(CCFLAGS)\nLIB = $(CRAY_LIB)\nSIZE =      size\n

              With the appropriate LAMMPS Makefile in place an executable can be compiled as in the following example, which uses the NVIDIA compilers.

              module load craype-accel-nvidia80\ncd lammps-<version>/src\nmake yes-KOKKOS\nmake polaris_kokkos_nvidia -j 16\n
              "},{"location":"polaris/applications-and-libraries/applications/lammps/#running-jobs-on-polaris","title":"Running Jobs on Polaris","text":"

              An example submission script for running a KOKKOS-enabled LAMMPS executable is below as an example. Additional information on LAMMPS application flags and options is described on the LAMMPS website.

              #!/bin/sh\n#PBS -l select=64:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:15:00\n#PBS -l filesystems=home:grand:eagle\n#PBS -q prod\n#PBS -A Catalyst\n\nexport MPICH_GPU_SUPPORT_ENABLED=1\n\nNNODES=`wc -l < $PBS_NODEFILE`\n\n# per-node settings\nNRANKS=4\nNRANKSSOCKET=2\nNDEPTH=8\nNTHREADS=1\nNGPUS=4\n\nNTOTRANKS=$(( NNODES * NRANKS ))\n\nEXE=/home/knight/bin/lammps_polaris_kokkos_nvidia\nEXE_ARG=\"-in in.reaxc.hns -k on g ${NGPUS} -sf kk -pk kokkos neigh half neigh/qeq full newton on \"\n\n# OMP settings mostly to quiet Kokkos messages\n\nMPI_ARG=\"-n ${NTOTRANKS} --ppn ${NRANKS} --depth=${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} --env OMP_PROC_BIND=spread --env OMP_PLACES=cores\"\n\nCOMMAND=\"mpiexec ${MPI_ARG} ${EXE} ${EXE_ARG}\"\necho \"COMMAND= ${COMMAND}\"\n${COMMAND}\n
              "},{"location":"polaris/applications-and-libraries/applications/lammps/#performance-notes","title":"Performance Notes","text":"

              Some useful information on accelerator packages and expectations can be found on the LAMMPS website here.

              "},{"location":"polaris/applications-and-libraries/applications/openmm/","title":"OpenMM on Polaris","text":""},{"location":"polaris/applications-and-libraries/applications/openmm/#what-is-openmm","title":"What is OpenMM?","text":"

              OpenMM is a high-performance toolkit for molecular simulations that can be used as a stand-alone application or as a library. It provides a combination of flexibility (through custom forces and integrators), openness, and high-performance (especially on recent GPUs).

              "},{"location":"polaris/applications-and-libraries/applications/openmm/#using-openmm-at-alcf","title":"Using OpenMM at ALCF","text":"

              ALCF offers assistance with building binaries and compiling instructions for OpenMM. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"polaris/applications-and-libraries/applications/openmm/#building-openmm-using-conda-module","title":"Building OpenMM using Conda module","text":"
              1. Update environment
                $ module load conda/2022-07-19\n
              2. Install OpenMM
                $ mkdir conda\n$ conda create --prefix /path-to/conda/openmm_env\n$ conda activate /path-to/conda/openmm_env\n$ conda install -c conda-forge openmm cudatoolkit=11.4\n$ conda deactivate /path-to/conda/openmm_env\n
              3. Validate installation: if successful, then info on code version, platform types, CUDA initialization, and force error tolerance will be shown.

                $ cd /path-to/conda/openmm_env/share/openmm/examples\n$ python -m openmm.testInstallation\n
              4. Benchmark testing using PBS job script below.

                $ cd /path-to/conda/openmm_env/share/openmm/examples\n$ qsub ./submit.sh\n
              "},{"location":"polaris/applications-and-libraries/applications/openmm/#running-openmm-benchmark-on-polaris","title":"Running OpenMM Benchmark on Polaris","text":"

              A sample pbs script follows that will run OpenMM benchmark on one node.

              #!/bin/sh\n#PBS -l select=1:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -q debug\n#PBS -A PROJECT\n#PBS -l filesystems=home:grand:eagle\n\ncd ${PBS_O_WORKDIR}\n\nmodule load cudatoolkit-standalone/11.4.4\n\npython benchmark.py --platform=CUDA --test=pme --precision=mixed --seconds=30 --heavy-hydrogens > test.output\n
              "},{"location":"polaris/applications-and-libraries/applications/openmm/#building-openmm-from-source","title":"Building OpenMM from Source","text":"
              1. Update environment
                $ module load cudatoolkit-standalone/11.4.4\n$ module load cray-python/3.9.12.1\n
              2. Download OpenMM
                $ git checkout https://github.com/openmm/openmm.git\n$ cd openmm ; mkdir build\n
              3. Download and build doxygen
                $ git clone https://github.com/doxygen/doxygen.git\n$ cd doxygen ; cmake ; make ; make install ; cd ../\n
              4. Download and install swig in OpenMM directory.
                $ tar xzf swig-4.0.2.tar.gz\n$ cd swig-4.0.2\n$ ./configure --prefix=/path-to/openmm/swig-4.0.2 ; make -j 8 ; make install\n
              5. Build OpenMM
                $ cmake -DDOXYGEN_EXECUTABLE=/path-to/openmm/doxygen/bin/doxygen \\\n        -DSWIG_EXECUTABLE=/path-to/openmm/swig-4.0.2/bin/swig \\\n        -DCMAKE_INSTALL_PREFIX=/path-to/openmm/build \\\n         -DCUDA_HOME=/soft/compilers/cudatoolkit/cuda-11.4.4 \\\n         -DCUDA_INCLUDE_DIR=/soft/compilers/cudatoolkit/cuda-11.4.4/include \\\n         -DCUDA_LIB_DIR=/soft/compilers/cudatoolkit/cuda-11.4.4/lib64\n$ make -j 8\n$ make install\n
              6. Validate installation: if successful, then info on code version, platform types, CUDA initialization, and force error tolerance will be shown.

                $ cd /path-to/openmm/examples\n$ python -m openmm.testInstallation\n
              7. Benchmark testing using the PBS job script above.

                $ cd /path-to/openmm/examples\n$ qsub ./submit.sh\n
              "},{"location":"polaris/applications-and-libraries/applications/vasp/","title":"VASP","text":""},{"location":"polaris/applications-and-libraries/applications/vasp/#vasp-6xx-in-polaris-nvhpcopenaccopenmpcuda-mathcraympi","title":"VASP 6.x.x in Polaris (NVHPC+OpenACC+OpenMP+CUDA math+CrayMPI)","text":"

              The Vienna Ab initio Simulation Package (VASP) is a software package for performing electronic structure calculations with periodic boundary conditions. It is most commonly used that to perform density functional theory (DFT) calculations in a planewave basis using the projector augemented wave (PAW) method. A more complete description of VASP can be found here.

              Users must have a license to use this code on ALCF systems. More information on how to get access to VASP binaries can be found here.

              "},{"location":"polaris/applications-and-libraries/applications/vasp/#general-compilinginstalling-instructions-provided-by-vasp-support","title":"General compiling/installing instructions provided by VASP support","text":"

              Instructions and samples of makefile.include could be found in the vasp.at wiki page.

              The follow makefile.include was tailored for Polaris, originally taken from here.

              # Precompiler options\nCPP_OPTIONS = -DHOST=\\\"LinuxNV\\\" \\\n              -DMPI -DMPI_BLOCK=8000 -Duse_collective \\\n              -DscaLAPACK \\\n              -DCACHE_SIZE=4000 \\\n              -Davoidalloc \\\n              -Dvasp6 \\\n              -Duse_bse_te \\\n              -Dtbdyn \\\n              -Dqd_emulate \\\n              -Dfock_dblbuf \\\n              -D_OPENMP \\\n              -D_OPENACC \\\n              -DUSENCCL -DUSENCCLP2P\\\n\nCPP        = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)\n\nFC         = ftn -acc -gpu=cc80 -mp -target-accel=nvidia80\nFCL        = ftn -acc -gpu=cc80 -c++libs -target-accel=nvidia80\n\nFREE       = -Mfree\n\nFFLAGS     = -Mbackslash -Mlarge_arrays\n\nOFLAG      = -fast\n\nDEBUG      = -Mfree -O0 -traceback\n\n# Specify your NV HPC-SDK installation, try to set NVROOT automatically\nNVROOT     =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')\n# ...or set NVROOT manually\nNVHPC      ?= /opt/nvidia/hpc_sdk\nNVVERSION  = 20.9\n#NVROOT     = $(NVHPC)/Linux_x86_64/$(NVVERSION)\n\n# Use NV HPC-SDK provided BLAS and LAPACK libraries\nLIBAOCL=/soft/libraries/aocl/3.2.0\nBLAS       = ${LIBAOCL}/lib/libblis-mt.a\nLAPACK     = ${LIBAOCL}/lib/libflame.a\n\nBLACS      =\nSCALAPACK  =\n#SCALAPACK  = -Mscalapack\n#SCALAPACK  = ${LIBAOCL}/lib/libscalapack.a\n\nCUDA       = -cudalib=cublas,cusolver,cufft,nccl -cuda\n\nLLIBS      = $(SCALAPACK) $(LAPACK) $(BLAS) $(CUDA)\n\n# Software emulation of quadruple precsion\nQD         ?= $(NVROOT)/compilers/extras/qd\nLLIBS      += -L$(QD)/lib -lqdmod -lqd\nINCS       += -I$(QD)/include/qd\n\n#INCS       += -I/usr/include/linux \n#INCS       += -I/usr/include/c++/7/tr1 \n#INCS       += -I/usr/include/c++/7 \n#INCS       += -I/usr/include/x86_64-linux-gnu/c++/7\n#INCS       += -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-sles15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include/\n\n# Use the FFTs from fftw\nFFTW       ?= ${LIBAOCL}\nLLIBS      += -L$(FFTW)/lib -lfftw3 -lfftw3_omp -lomp\n#INCS       += -I/soft/libraries/aocl/3.2.0/include_LP64/\nINCS       += -I$(FFTW)/include\n\nOBJECTS    = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o\n\n# Redefine the standard list of O1 and O2 objects\nSOURCE_O1  := pade_fit.o\nSOURCE_O2  := pead.o\n\n# For what used to be vasp.5.lib\nCPP_LIB    = $(CPP)\nFC_LIB     = nvfortran\nCC_LIB     = cc\nCFLAGS_LIB = -O $(INCS) -c++libs -cuda\nFFLAGS_LIB = -O1 -Mfixed\nFREE_LIB   = $(FREE)\n\nOBJECTS_LIB= linpack_double.o getshmem.o\n\n# For the parser library\n#CXX_PARS   = nvc++ --no_warnings -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-sles15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/include/c++/10.2.0/ -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-s\nles15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/include/c++/10.2.0/x86_64-pc-linux-gnu -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-sles15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/lib/gcc\n/x86_64-pc-linux-gnu/10.2.0/include -I/lus/theta-fs0/software/spack/spack-dev/opt/spack/linux-sles15-x86_64/gcc-9.3.0/gcc-10.2.0-r7v3naxd5xgzzaqxoe73jj2ytwuddamr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include-fixed/\nCXX_PARS   = nvc++ --no_warnings \n\n# Normally no need to change this\nSRCDIR     = ../../src\nBINDIR     = ../../bin\n
              "},{"location":"polaris/applications-and-libraries/applications/vasp/#setting-up-compiler-and-libraries-with-module","title":"Setting up compiler and libraries with module","text":"

              The follow modules will update the include and libraries paths used by the Cray compiler wrapper ftn to load additional math libraries for the CPU.

              module purge\nmodule load nvhpc/23.3\nmodule load PrgEnv-nvhpc\nmodule load cray-libsci\nmodule load craype-accel-nvidia80\n
              "},{"location":"polaris/applications-and-libraries/applications/vasp/#compiling-vasp","title":"Compiling VASP","text":"

              Once the modules are loaded and a makefile.include is in the vasp folder, compiling all the object files and binaries is done with:

              make -j1\n
              "},{"location":"polaris/applications-and-libraries/applications/vasp/#running-vasp-in-polaris","title":"Running VASP in Polaris","text":"

              An example of a submission script could be found here /soft/applications/vasp/submit-polaris2023-2.sh , which would looks something similar to:

              #!/bin/sh\n#PBS -l select=1:system=polaris  \n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -l filesystems=home:grand:eagle\n#PBS -q debug\n#PBS -A Catalyst\n\nmodule load PrgEnv-nvhpc\nmodule load cray-libsci\n\nexport MPICH_GPU_SUPPORT_ENABLED=1\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS=2\nNDEPTH=4\nNTHREADS=4\nNGPUS=2\nNTOTRANKS=$(( NNODES * NRANKS ))\n\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS} --depth ${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} /path_to_vasp/bin/vasp_std\n

              Submission scripts should have executable attibutes to be used with qsub script mode.

              chmod +x example-script.sh\nqsub  example-script.sh\n
              "},{"location":"polaris/applications-and-libraries/applications/vasp/#known-issues-versions-64x-in-polaris","title":"Known issues versions: >= 6.4.x in Polaris","text":"
              • Undefined MPIX_Query_cuda_support function at linking binary: This function is called in src/openacc.F. The MPIX_Query_cuda_support is not included incray-mpich. One workaround to this issue is to comment this function call. See the follow suggested changes marked by !!!!!CHANGE HERE in the file:src/openacc.F
              +!!!!!CHANGE HERE \n-      INTERFACE\n-        INTEGER(c_int) FUNCTION MPIX_Query_cuda_support() BIND(C, name=\"MPIX_Query_cuda_support\")\n-        END FUNCTION\n-      END INTERFACE\n\n       CHARACTER(LEN=1) :: ENVVAR_VALUE\n       INTEGER :: ENVVAR_STAT\n\n       ! This should tell us if MPI is CUDA-aware\n+!!!!!CHANGE HERE \n-       CUDA_AWARE_SUPPORT = MPIX_Query_cuda_support() == 1\n+       CUDA_AWARE_SUPPORT = .TRUE.\n       ! However, for OpenMPI some env variables can still deactivate it even though the previous\n       ! check was positive\n       CALL GET_ENVIRONMENT_VARIABLE(\"OMPI_MCA_mpi_cuda_support\", ENVVAR_VALUE, STATUS=ENVVAR_STAT)\n       IF (ENVVAR_STAT==0 .AND. ENVVAR_VALUE=='0') CUDA_AWARE_SUPPORT = .FALSE.\n       CALL GET_ENVIRONMENT_VARIABLE(\"OMPI_MCA_opal_cuda_support\", ENVVAR_VALUE, STATUS=ENVVAR_STAT)\n       IF (ENVVAR_STAT==0 .AND. ENVVAR_VALUE=='0') CUDA_AWARE_SUPPORT = .FALSE.\n       ! Just in case we might be non-OpenMPI, and their MPIX_Query_cuda_support behaves similarly\n       CALL GET_ENVIRONMENT_VARIABLE(\"MV2_USE_CUDA\", ENVVAR_VALUE, STATUS=ENVVAR_STAT)\n       IF (ENVVAR_STAT==0 .AND. ENVVAR_VALUE=='0') CUDA_AWARE_SUPPORT = .FALSE.\n       CALL GET_ENVIRONMENT_VARIABLE(\"MPICH_RDMA_ENABLED_CUDA\", ENVVAR_VALUE, STATUS=ENVVAR_STAT)\n       IF (ENVVAR_STAT==0 .AND. ENVVAR_VALUE=='0') CUDA_AWARE_SUPPORT = .FALSE.\n       CALL GET_ENVIRONMENT_VARIABLE(\"PMPI_GPU_AWARE\", ENVVAR_VALUE, STATUS=ENVVAR_STAT)\n       IF (ENVVAR_STAT==0) CUDA_AWARE_SUPPORT =(ENVVAR_VALUE == '1')\n+!!!!!CHANGE HERE \n+       CALL GET_ENVIRONMENT_VARIABLE(\"MPICH_GPU_SUPPORT_ENABLED\", ENVVAR_VALUE, STATUS=ENVVAR_STAT)\n+       IF (ENVVAR_STAT==0) CUDA_AWARE_SUPPORT =(ENVVAR_VALUE == '1')\n
              "},{"location":"polaris/applications-and-libraries/libraries/cabana-polaris/","title":"Cabana","text":""},{"location":"polaris/applications-and-libraries/libraries/cabana-polaris/#cabana_1","title":"Cabana","text":"

              Cabana is built atop Kokkos. It provides class templates useful for implementing particle codes

              "},{"location":"polaris/applications-and-libraries/libraries/cabana-polaris/#cabana-documentation","title":"Cabana Documentation","text":"
              • Cabana Wiki
              • Cabana github
              "},{"location":"polaris/applications-and-libraries/libraries/cabana-polaris/#cabana-on-polaris","title":"Cabana on Polaris","text":"

              Built against the prebuilt Kokkos on polaris, the prebuilt Cabana includes 3 backends: Serial and OpenMP for CPU execution and CUDA for GPU execution. To use it, run

              module use /soft/modulefiles\nmodule swap PrgEnv-nvhpc PrgEnv-gnu\nmodule swap gcc/12.2.0 gcc/11.2.0\nmodule load cudatoolkit-standalone/11.8.0\nmodule load kokkos\nmodule load cabana\n

              (Since the SlingShot 11 upgrade, you must use PrgEnv-gnu and the gcc and cudatoolkit version changes indicated, at least until some subsequent Polaris sytem updates have been completed.)

              Currently, Cabana is a headers-only installation; there are no libraries per se.

              "},{"location":"polaris/applications-and-libraries/libraries/math-libraries/","title":"Math Libraries","text":""},{"location":"polaris/applications-and-libraries/libraries/math-libraries/#blas-lapack-and-scalapack-for-cpus","title":"BLAS, LAPACK, and ScaLAPACK for CPUs","text":"

              Some math libraries targeting CPUs are made available as part of the nvhpc modules and are based on the OpenBLAS project. Additional documentation is available from NVIDIA.

              • BLAS & LAPACK can be found in the $NVIDIA_PATH/compilers/lib directory.
              • ScaLAPACK can be found in the $NVIDIA_PATH/comm_libs directory.
              • GNU Scientific Library, GSL-2.7 available as module help gsl
              • AMD Optiming CPU Libraries, AOCL v4.0 available as module help aocl
              • Other Cray-based math libs such as Libsci, FFTW were made available by module load cray-libsci & module load cray-fftw
              "},{"location":"polaris/applications-and-libraries/libraries/math-libraries/#nvidia-math-libraries-for-gpus","title":"NVIDIA Math Libraries for GPUs","text":"

              Math libraries from NVIDIA are made available via the nvhpc modules. Many of the libraries users typically use can be found in the $NVIDIA_PATH/math_libs directory. Some examples follow and additional documentation is available from NVIDIA.

              • libcublas
              • libcufft
              • libcurand
              • libcusolver
              • libcusparse
              "},{"location":"polaris/build-tools/cmake-polaris/","title":"CMake","text":""},{"location":"polaris/build-tools/cmake-polaris/#cmake_1","title":"CMake","text":"

              CMake is a build configuration system that uses higher-level description files to automatically generate Makefiles.

              "},{"location":"polaris/build-tools/cmake-polaris/#cmake-documentation","title":"CMake Documentation","text":"
              • CMake website
              "},{"location":"polaris/build-tools/cmake-polaris/#cmake-on-polaris","title":"CMake on Polaris","text":"

              To use CMake on Polaris, run

              module use /soft/modulefiles\nmodule load cmake\n
              "},{"location":"polaris/compiling-and-linking/cce-compilers-polaris/","title":"CCE Compilers on Polaris","text":"

              The Cray Compiling Environment (CCE) compilers are available on Polaris via the PrgEnv-cray module.

              The CCE compilers currently on Polaris only support AMD GPU targets for HIP and are thus not usable with the A100 GPUs.

              The nvhpc and llvm compilers can be used for compiling GPU-enabled applications.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/","title":"Compiling and Linking Overview on Polaris","text":""},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#compiling-on-polaris-login-and-compute-nodes","title":"Compiling on Polaris Login and Compute Nodes","text":"

              If your build system does not require GPUs for the build process, as is usually the case, compilation of GPU-accelerated codes is generally expected to work well on the Polaris login nodes. If your build system does require GPUs, you cannot yet compile on the Polaris login nodes, as they do not currently have GPUs installed. You may in this case compile your applications on the Polaris compute nodes. Do this by submitting an interactive single-node job, or running your build system in a batch job.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#home-file-system","title":"Home File System","text":"

              Is it helpful to realize that there is a single HOME filesystem for users that can be accessed from the login and computes of each production resource at ALCF. Thus, users should be mindful of modifications to their environments (e.g. .bashrc) that may cause issues to arise due to differences between the systems.

              An example is creating an alias for the qstat command to, for example, change the order of columns printed to screen. Users with such an alias that works well on Theta may run into issues using qstat on Polaris as the two system use different schedulers: Cobalt (Theta) and PBS (Polaris). Users with such modifications to their environments are encouraged to modify their scripts appropriately depending on $hostname.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#cray-programming-environment","title":"Cray Programming Environment","text":"

              The Cray Programming Environment (PE) uses three compiler wrappers for building software. These compiler wrappers should be used when building MPI-enabled applications.

              • cc - C compiler
              • CC - C++ compiler
              • ftn - Fortran compiler

              Each of these wrappers can select a specific vendor compiler based on the PrgEnv module loaded in the environment. The following are some helpful options to understand what the compiler wrapper is invoking.

              • --craype-verbose : Print the command which is forwarded to the compiler invocation
              • --cray-print-opts=libs : Print library information
              • --cray-print-opts=cflags : Print include information

              The output from these commands may be useful in build scripts where a compiler other than that invoked by a compiler wrapper is desired. Defining some variables as such may prove useful in those situations.

              CRAY_CFLAGS=$(cc --cray-print-opts=cflags)\nCRAY_LIB=$(cc --cray-print-opts=libs)\n
              Further documentation and options are available via man cc and similar.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#compilers-provided-by-cray-programming-environments","title":"Compilers provided by Cray Programming Environments","text":"

              The default programming environment on Polaris is currently NVHPC. The GNU compilers are available via another programming environment. The following sequence of module commands can be used to switch to the GNU programming environment (gcc, g++, gfortran) and also have NVIDIA compilers available in your path.

              module swap PrgEnv-nvhpc PrgEnv-gnu\nmodule load nvhpc-mixed\n

              The compilers invoked by the Cray MPI wrappers are listed for each programming environment in the following table.

              module C C++ Fortran MPI Compiler Wrapper cc CC ftn PrgEnv-nvhpc nvc nvc++ nvfortran PrgEnv-gnu gcc g++ gfortran

              Note, while gcc and g++ may be available in the default environment, the PrgEnv-gnu module is needed to provide gfortran.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#additional-compilers-provided-by-alcf","title":"Additional Compilers Provided by ALCF","text":"

              The ALCF additionally provides compilers to enable the OpenMP and SYCL programming models for GPUs viaLLVM as documented here

              Additional documentation for using compilers is available on the respective programming model pages: OpenMP and SYCL.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#linking","title":"Linking","text":"

              Dynamic linking of libraries is currently the default on Polaris. The Cray MPI wrappers will handle this automatically.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#notes-on-default-modules","title":"Notes on Default Modules","text":"
              • craype-x86-rome: While the Polaris compute nodes currently have Milan CPUs, this module is loaded by default to avoid the craype-x86-milan module from adding a zen3 target not supported in the default nvhpc/21.9 compilers. The craype-x86-milan module is expected to be made default once a newer nvhpc version (e.g. 22.5) is made the default.

              • craype-accel-nvidia80: This module adds compiler flags to enable GPU acceleration for NVHPC compilers along with gpu-enabled MPI libraries as it is assumed that the majority of applications to be compiled on Polaris will target the GPUs for acceleration. Users building cpu-only applications may find it useful to unload this module to silence \"gpu code generation\" warnings.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#mixed-cc-fortran-applications","title":"Mixed C/C++ & Fortran Applications","text":"

              For applications consisting of a mix of C/C++ and Fortran that also uses MPI, it is suggested that the programming environment chosen for Fortran be used to build the full application because of mpi.mod (and similar) incompatibilities.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#compiling-for-gpus","title":"Compiling for GPUs","text":"

              It is assumed the majority of applications to be built on Polaris will make use of the GPUs. As such, the craype-accel-nvidia80 module is in the default environment. This has the effect of the Cray compiler wrappers adding -gpu to the compiler invocation along with additional include paths and libraries. Additional compilers flags may be needed depending on the compiler and GPU programming model used (e.g. -cuda, -acc, or -mp=gpu).

              This module also adds GPU Transport Layer (GTL) libraries to the link-line to support GPU-aware MPI applications.

              "},{"location":"polaris/compiling-and-linking/compiling-and-linking-overview/#man-pages","title":"Man Pages","text":"

              For additional information on the Cray wrappers, please refer to the man pages.

              man cc\nman CC\nman ftn\n

              "},{"location":"polaris/compiling-and-linking/gnu-compilers-polaris/","title":"GNU Compilers on Polaris","text":"

              The GNU compilers are available on Polaris via the PrgEnv-gnu and gcc-mixed modules. The gcc-mixed module can be useful when, for example, the PrgEnv-nvhpc compilers are used to compile C/C++ MPI-enabled code and gfortran is needed.

              The GNU compilers currently on Polaris do not support GPU code generation and thus can only be used for compiling CPU codes.

              The nvhpc and llvm compilers can be used for compiling GPU-enabled applications.

              "},{"location":"polaris/compiling-and-linking/llvm-compilers-polaris/","title":"LLVM Compilers on Polaris","text":"

              This page is not about LLVM-based Cray Compiling Environment (CCE) compilers from PrgEnv-cray but about open source LLVM compilers. If LLVM compilers are needed without MPI support, simply load the llvm module.

              Cray Programming Environment does not offer LLVM compiler support. Thus cc/CC/ftn compiler wrappers using LLVM compilers currently are not available. To use Clang with MPI, one can load the mpiwrappers/cray-mpich-llvm module which loads the following modules.

              • llvm, upstream llvm compilers
              • cray-mpich, MPI compiler wrappers mpicc/mpicxx/mpif90. mpif90 uses gfortran because flang is not ready for production use.
              • cray-pals, MPI launchers mpiexec/aprun/mpirun

              Limitation There is no GPU-aware MPI library linking support by default. If needed, users should manually add the GTL (GPU Transport Layer) library to the application link line.

              "},{"location":"polaris/compiling-and-linking/llvm-compilers-polaris/#openmp-offload","title":"OpenMP offload","text":"

              When targeting the OpenMP or CUDA programming models for GPUs, the cudatoolkit-standalone module should also be loaded.

              "},{"location":"polaris/compiling-and-linking/nvidia-compiler-polaris/","title":"NVIDIA Compilers on Polaris","text":"

              The NVIDIA compilers (nvc, nvc++, nvcc, and nvfortran) are available on Polaris via the PrgEnv-nvhpc and nvhpc modules. There is currently a PrgEnv-nvidia module available, but that will soon be deprecated in Cray's PE, thus it is not recommend for use.

              The Cray compiler wrappers map to NVIDIA compilers as follows.

              cc -> nvc\nCC -> nvc++\nftn -> nvfortran\n

              Users are encouraged to look through NVIDIA's documentation for the NVHPC SDK and specific information on the compilers, tools, and libraries.

              "},{"location":"polaris/compiling-and-linking/nvidia-compiler-polaris/#notes-on-nvidia-compilers","title":"Notes on NVIDIA Compilers","text":""},{"location":"polaris/compiling-and-linking/nvidia-compiler-polaris/#pgi-compilers","title":"PGI compilers","text":"

              The NVIDIA programming environments makes available compilers from the NVIDIA HPC SDK. While the PGI compilers are available in this programming environment, it should be noted they are actually symlinks to the corresponding NVIDIA compilers.

              pgcc -> nvc\npgc++ -> nvc++\npgf90 -> nvfortran\npgfortran -> nvfortran\n
              While nvcc is the traditional CUDA C and CUDA C++ compiler for NVIDIA GPUs, the nvc, nvc++, and nvfortran compilers additionally target CPUs.

              "},{"location":"polaris/compiling-and-linking/nvidia-compiler-polaris/#nvhpc-sdk-directory-structure","title":"NVHPC SDK Directory Structure","text":"

              Users migrating from CUDA toolkits to the NVHPC SDK may find it beneficial to review the directory structure of the hpc-sdk directory to find the location of commonly used libraries (including math libraries for the CPU). With the PrgEnv-nvhpc module loaded, the NVIDIA_PATH environment variable can be used to locate the path to various NVIDIA tools, libraries, and examples.

              • compiler/bin - cuda-gdb, ncu, nsys, ...
              • examples - CUDA-Fortran, OpenMP, ...
              • comm_libs - nccl, nvshmem, ...
              • compiler/libs - blas, lapack, ...
              • cuda/lib64 - cudart, OpenCL, ...
              • math_libs/lib64 - cublas, cufft, ...
              "},{"location":"polaris/compiling-and-linking/nvidia-compiler-polaris/#differences-between-nvcc-and-nvcnvc","title":"Differences between nvcc and nvc/nvc++","text":"

              For users that want to continue using nvcc it is important to be mindful of differences with the newer nvc and nvc++ compilers. For example, the -cuda flag instructs nvcc to compile .cu input files to .cu.cpp.ii output files which are to be separately compiled, whereas the same -cuda flag instructs nvc, nvc++, and nvfortran to enable CUDA C/C++ or CUDA Fortran code generation. The resulting output file in each case is different (text vs. object) and one may see unrecognized format error when -cuda is incorrectly passed to nvcc.

              "},{"location":"polaris/compiling-and-linking/nvidia-compiler-polaris/#known-issues-and-workarounds","title":"Known Issues and Workarounds","text":"

              If you are using nvcc to invoke nvc++ and compiling C++17 code, and are seeing the following warning and unable to compile C++17 constructs:

              polaris-login-01(~)> nvcc --std=c++17 -ccbin nvc++ ~/smalltests/bool_constant.cpp\nnvcc warning : The -std=c++17 flag is not supported with the configured host compiler. Flag will be ignored.\n\"/home/zippy/smalltests/bool_constant.cpp\", line 10: error: namespace \"std\" has no member class \"bool_constant\"\n      : std::bool_constant<(UnaryPred<Ts>::value || ...)> {};\n             ^\n\n\"/home/zippy/smalltests/bool_constant.cpp\", line 10: error: class or struct definition is missing\n      : std::bool_constant<(UnaryPred<Ts>::value || ...)> {};\n                          ^\n\n2 errors detected in the compilation of \"/home/zippy/smalltests/bool_constant.cpp\".\npolaris-login-01(~)>\n

              you will need to work around it by loading the latest cudatoolkit module atop PrgEnv-nvhpc:

              module load cudatoolkit-standalone/11.6.2\n
              "},{"location":"polaris/compiling-and-linking/oneapi-compiler/","title":"oneAPI Compilers and Support","text":"

              The Intel oneAPI compiler and Codeplay plugins for Nvidia GPUs are available on Polaris. The oneAPI compilers are not enabled under the Cray Programming Environment system but can be used separately. Two oneAPI variants are provided, the first being a \"release\" version based on Intel's officially released oneAPI toolkit. Intel Release Notes

              Note

              The 2023.2.1 release of oneAPI Toolkit does not yet support oneDPL on Nvidia devices. Though oneMKL is now added to 2023.2.1 release onwards

              "},{"location":"polaris/compiling-and-linking/oneapi-compiler/#components","title":"Components","text":"
              • These are the list of components associated with this module
              User Application Component Compilers DPC++ oneMKL Interfaces oneMKL

              The other variant being a build from the open-source. This variant will be more up-to-date at the risk of bugs and breakages based on code that has not undergone a full release cycle. The documentation is located on the SYCL page. Most notable differences being, icx/icpx are the names of C/C++ compilers respectively when using the release version of the module where as clang/clang++ are for open-source variant.

              "},{"location":"polaris/compiling-and-linking/oneapi-compiler/#compile-and-link","title":"Compile and Link","text":"

              oneAPI uses the clang (or icx/icpx wrapper) for compiling and linking for the Nvidia A100 SM80 architecture.

              module load oneapi/release\nicpx -std=c++17 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 test.cpp\n
              harms@polaris-login-04:~/working/polaris/oneapi> icpx -v\nIntel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230721)\nTarget: x86_64-unknown-linux-gnu\nThread model: posix\nInstalledDir: /soft/compilers/oneapi/release/2023.2/compiler/2023.2.1/linux/bin-llvm\nConfiguration file: /soft/compilers/oneapi/release/2023.2/compiler/2023.2.1/linux/bin-llvm/../bin/icpx.cfg\nFound candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/7\nSelected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/7\nCandidate multilib: .;@m64\nSelected multilib: .;@m64\nFound CUDA installation: /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/cuda/11.4, version 11.4\n
              "},{"location":"polaris/compiling-and-linking/oneapi-compiler/#running","title":"Running","text":"

              The library should select the GPU by default, but selection of the GPUs can be forced via the ONEAPI_DEVICE_SELECTOR

              $ ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu ./a.out\n
              or a specific GPU.
              $ ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu:3 ./a.out\n

              "},{"location":"polaris/compiling-and-linking/oneapi-compiler/#sycl-ls","title":"sycl-ls","text":"

              Expected output of sycl-ls and which platforms are available.

              harms@x3004c0s7b0n0:~> which sycl-ls\n/soft/compilers/oneapi/release/2023.2/compiler/2023.2.1/linux/bin/sycl-ls\n\nharms@x3004c0s7b0n0:~> sycl-ls\n[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.7.0.21_160000]\n[opencl:cpu:1] Intel(R) OpenCL, AMD EPYC 7543P 32-Core Processor                3.0 [2023.16.7.0.21_160000]\n[ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-40GB 8.8 [CUDA 11.4]\n[ext_oneapi_cuda:gpu:1] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-40GB 8.8 [CUDA 11.4]\n[ext_oneapi_cuda:gpu:2] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-40GB 8.8 [CUDA 11.4]\n[ext_oneapi_cuda:gpu:3] NVIDIA CUDA BACKEND, NVIDIA A100-SXM4-40GB 8.8 [CUDA 11.4]\n
              "},{"location":"polaris/compiling-and-linking/polaris-example-program-makefile/","title":"Example Programs and Makefiles for Polaris","text":"

              Several simple examples of building CPU and GPU-enabled codes on Polaris are available in the ALCF GettingStart repo for several programming models. If build your application is problematic for some reason (e.g. absence of a GPU), then users are encouraged to build and test applications directly on one of the Polaris compute nodes via an interactive job. The discussion below makes use of the NVHPC compilers in the default environment as illustrative examples. Similar examples for other compilers on Polaris are available in the ALCF GettingStarted repo.

              "},{"location":"polaris/compiling-and-linking/polaris-example-program-makefile/#cpu-mpiopenmp-example","title":"CPU MPI+OpenMP Example","text":"

              One of the first useful tasks with any new machine, scheduler, and job launcher is to ensure one is binding MPI ranks and OpenMP threads to the host cpu as intended. A simple HelloWorld MPI+OpenMP example is available here to get started with.

              The application can be straightforwardly compiled using the Cray compiler wrappers.

              CC -fopenmp main.cpp -o hello_affinity\n

              The executable hello_affinity can then be launched in a job script (or directly in shell of interactive job) using mpiexec as discussed here.

              #!/bin/sh\n#PBS -l select=1:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -l filesystems=home\n\n# MPI example w/ 16 MPI ranks per node spread evenly across cores\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=16\nNDEPTH=4\nNTHREADS=1\n\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth ./hello_affinity\n
              "},{"location":"polaris/compiling-and-linking/polaris-example-program-makefile/#cuda","title":"CUDA","text":"

              Several variants of C/C++ and Fortran CUDA examples are available here that include MPI and multi-gpu examples.

              One can use the Cray compiler wrappers to compile GPU-enabled applications as well. This example of simple vector addition uses the NVIDIA compilers.

              CC -g -O3 -std=c++0x -cuda main.cpp -o vecadd\n

              The craype-accel-nvidia80 module in the default environment will add the -gpu compiler flag for nvhpc compilers along with appropriate include directories and libraries. It is left to the user to provide an additional flag to the nvhpc compilers to select the target GPU programming model. In this case, -cuda is used to indicate compilation of CUDA code. The application can then be launched within a batch job submission script or as follows on one of the compute nodes.

              $ ./vecadd \n# of devices= 4\n  [0] Platform[ Nvidia ] Type[ GPU ] Device[ NVIDIA A100-SXM4-40GB ]\n  [1] Platform[ Nvidia ] Type[ GPU ] Device[ NVIDIA A100-SXM4-40GB ]\n  [2] Platform[ Nvidia ] Type[ GPU ] Device[ NVIDIA A100-SXM4-40GB ]\n  [3] Platform[ Nvidia ] Type[ GPU ] Device[ NVIDIA A100-SXM4-40GB ]\nRunning on GPU 0!\nUsing single-precision\n\n  Name= NVIDIA A100-SXM4-40GB\n  Locally unique identifier= \n  Clock Frequency(KHz)= 1410000\n  Compute Mode= 0\n  Major compute capability= 8\n  Minor compute capability= 0\n  Number of multiprocessors on device= 108\n  Warp size in threads= 32\n  Single precision performance ratio= 2\n\nResult is CORRECT!! :)\n
              "},{"location":"polaris/compiling-and-linking/polaris-example-program-makefile/#gpu-openacc","title":"GPU OpenACC","text":"

              A simple MPI-parallel OpenACC example is available here. Compilation proceeds similar to the above CUDA example except for the use of the -acc=gpu compiler flag to indicate compilation of OpenACC code for GPUs.

              CC -g -O3 -std=c++0x -acc=gpu -gpu=cc80,cuda11.0 main.cpp -o vecadd\n
              In this example, each MPI rank sees all four GPUs on a Polaris node and GPUs are bound to MPI ranks round-robin within the application.

              $ mpiexec -n 4 ./vecadd\n# of devices= 4\nUsing single-precision\n\nRank 0 running on GPU 0!\nRank 1 running on GPU 1!\nRank 2 running on GPU 2!\nRank 3 running on GPU 3!\n\nResult is CORRECT!! :)\n
              If the application instead relies on the job launcher to bind MPI ranks to available GPUs, then a small helper script can be used to explicitly set CUDA_VISIBLE_DEVICES appropriately for each MPI rank. One example is available here where each MPI rank is similarly bound to a single GPU with round-robin assignment. The binding of MPI ranks to GPUs is discussed in more detail here.

              "},{"location":"polaris/compiling-and-linking/polaris-example-program-makefile/#gpu-opencl","title":"GPU OpenCL","text":"

              A simple OpenCL example is available here. The OpenCL headers and library are available in the NVHPC SDK and cuda toolkits. The environment variable NVIDIA_PATH is defined for the PrgEnv-nvhpc programming environment.

              CC -o vecadd -g -O3 -std=c++0x  -I${NVIDIA_PATH}/cuda/include main.o -L${NVIDIA_PATH}/cuda/lib64 -lOpenCL\n

              This simple example can be run on a Polaris compute node as follows.

              $ ./vecadd\nRunning on GPU!\nUsing single-precision\n\n    CL_DEVICE_NAME: NVIDIA A100-SXM4-40GB\n    CL_DEVICE_VERSION: OpenCL 3.0 CUDA\n    CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 \n    CL_DEVICE_MAX_COMPUTE_UNITS: 108\n    CL_DEVICE_MAX_CLOCK_FREQUENCY: 1410\n    CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024\n\nResult is CORRECT!! :)\n

              "},{"location":"polaris/compiling-and-linking/polaris-example-program-makefile/#gpu-openmp","title":"GPU OpenMP","text":"

              A simple MPI-parallel OpenMP example is available here. Compilation proceeds similar to the above examples except for use of the -mp=gpu compiler flag to indicated compilation of OpenMP code for GPUs.

              CC -g -O3 -std=c++0x -mp=gpu -gpu=cc80,cuda11.0 -c main.cpp -o vecadd\n

              Similar to the OpenACC example above, this code binds MPI ranks to GPUs in a round-robin fashion.

              $ mpiexec -n 4 ./vecadd\n# of devices= 4\nRank 0 running on GPU 0!\nRank 1 running on GPU 1!\nRank 2 running on GPU 2!\nRank 3 running on GPU 3!\n\nResult is CORRECT!! :)\n

              "},{"location":"polaris/compiling-and-linking/polaris-programming-models/","title":"Programming Models on Polaris","text":"

              The software environment on Polaris supports several parallel programming models targeting the CPUs and GPUs.

              "},{"location":"polaris/compiling-and-linking/polaris-programming-models/#cpu-parallel-programming-models","title":"CPU Parallel Programming Models","text":"

              The Cray compiler wrappers cc, CC, and ftn are recommended for MPI applications as they provide the needed include paths and libraries for each programming environment. A summary of available CPU parallel programming models and relevant compiler flags is shown below. Users are encouraged to review the corresponding man pages and documentation.

              Programming Model GNU NVHPC LLVM OpenMP -fopenmp -mp -fopenmp OpenACC -- -acc=multicore --

              Higher-level programming models such as Kokkos and Raja may also be used for CPU programming.

              "},{"location":"polaris/compiling-and-linking/polaris-programming-models/#gpu-programming-models","title":"GPU Programming Models","text":"

              A summary of available GPU programming models and relevant compiler flags is shown below for compilers that generate offloadable code. Users are encouraged to review the corresponding man pages and documentation.

              Programming Model GNU NVHPC LLVM ONEAPI CUDA -- -cuda [-gpu=cuda8.0,cc11.0] -- -- HIP* -- -- -- -- OpenACC -- -acc -- -- OpenCL* -- -- -- -- OpenMP -- -mp=gpu -fopenmp-targets=nvptx64 -- SYCL -- -- -- -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80

              Note, the llvm and oneapi modules are provided by ALCF to complement the compilers provided by the Cray PE on Polaris.

              Higher-level programming models such as Kokkos and Raja may also be used for GPU programming.

              OpenCL is supported, but does not require specific compiler flags per-se as the offloaded kernels are just-in-time compiled. Abstraction programming models, such as Kokkos, can be built on top of some of these programming models (see below).

              A HIP compiler supporting the A100 GPUs is still to be installed on Polaris.

              "},{"location":"polaris/compiling-and-linking/polaris-programming-models/#mapping-programming-models-to-polaris-modules","title":"Mapping Programming Models to Polaris Modules","text":"

              The table below offers some suggestions for how to get started setting up your environment on Polaris depending on the programming language and model. Note, mixed C/C++ and Fortran applications should choose the programming environment for the Fortran compiler because of mpi.mod and similar incompatibilities between Fortran-generated files from different compilers. Several simple examples for testing the software environment on Polaris for different programming models are available in the ALCF GettingStart repo.

              Note, users are encouraged to use PrgEnv-nvhpc instead of PrgEnv-nvidia as the latter will soon be deprecated in Cray's PE. They are otherwise identical pointing to compilers from the same NVIDIA SDK version.

              Programming Language GPU Programming Model Likely used Modules/Compilers Notes C/C++ CUDA PrgEnv-nvhpc, PrgEnv-gnu, llvm NVIDIA (nvcc, nvc, nvc++) and clang compilers do GPU code generation C/C++ HIP N/A need to install with support for A100 C/C++ Kokkos See CUDA HIP, OpenMP, and SYCL/DPC++ also candidates C/C++ OpenACC PrgEnv-nvhpc C/C++ OpenCL PrgEnv-nvhpc, PrgEnv-gnu, llvm JIT GPU code generation C/C++ OpenMP PrgEnv-nvhpc, llvm C/C++ RAJA See CUDA HIP, OpenMP, and SYCL/DPC++ also candidates C/C++ SYCL/DPC++ llvm-sycl Fortran CUDA PrgEnv-nvhpc NVIDIA compiler (nvfortran) does GPU code generation; gfortran can be loaded via gcc-mixed Fortran HIP N/A need to install with support for A100 Fortran OpenACC PrgEnv-nvhpc Fortran OpenCL PrgEnv-nvhpc, PrgEnv-gnu JIT GPU code generation Fortran OpenMP PrgEnv-nvhpc"},{"location":"polaris/data-science-workflows/julia/","title":"Julia","text":"

              Julia is a high-level, high-performance dynamic programming language for technical computing. It has a syntax familiar to users of many other technical computing environments. Designed at MIT to tackle large-scale partial-differential equation simulation and distributed linear algebra, Julia features a robust ecosystem of tools for optimization, statistics, parallel programming, and data visualization. Julia is actively developed by the Julia Labs team at MIT and in industry, along with hundreds of domain-expert scientists and programmers worldwide.

              "},{"location":"polaris/data-science-workflows/julia/#contributing","title":"Contributing","text":"

              This guide is a first draft of the Julia documentation for Polaris. If you have any suggestions or contributions, please open a pull request or contact us by opening a ticket at the ALCF Helpdesk.

              "},{"location":"polaris/data-science-workflows/julia/#julia-installation","title":"Julia Installation","text":"

              Using the official Julia 1.9 binaries from the Julia webpage is recommended. Juliaup provides a convenient way to install Julia and manage the various Julia versions.

              curl -fsSL https://install.julialang.org | sh\n

              You may then list the available Julia versions with juliaup list and install a specific version with juliaup install <version>. You can then activate a specific version with juliaup use <version> and set the default version with juliaup default <version>. juliaup update will update the installed Julia versions. In general, the latest stable release of Julia should be used.

              juliaup add release\n
              "},{"location":"polaris/data-science-workflows/julia/#julia-project-environment","title":"Julia Project Environment","text":"

              The Julia built-in package manager allows you to create a project and enable project-specific dependencies. Julia manages packages in the Julia depot located by default in ~/.julia. However, that NFS filesystem is not meant for high-speed access. Therefore, this Julia depot folder should be located on a fast filesystem of your choice (grand, eagle). The Julia depot directory is set via the environment variable JULIA_DEPOT_PATH. For example, you can set the Julia depot to a directory on Polaris grand filesystem by adding the following line to your ~/.bashrc file:

              export /lus/grand/projects/$PROJECT/$USER/julia_depot\n
              "},{"location":"polaris/data-science-workflows/julia/#programming-julia-on-polaris","title":"Programming Julia on Polaris","text":"

              There are three key components to using Julia for large-scale computations:

              1. MPI support through MPI.jl
              2. GPU support through CUDA.jl
              3. HDF5 support through HDF5.jl

              In addition, we recommend VSCode with the Julia extension for a modern IDE experience, together with the ssh-remote extension for remote interactive development.

              "},{"location":"polaris/data-science-workflows/julia/#mpi-support","title":"MPI Support","text":"

              MPI support is provided through the MPI.jl.

              julia> ] add MPI\n
              This will install the MPI.jl package and default MPI prebuilt binaries provided by an artifact. For on-node debugging purposes the default artifact is sufficient. However, for large-scale computations, it is to use the system MPI library that is loaded via module. As of MPI.jl v0.20 this is handled through MPIPrefences.jl.
              julia --project -e 'using MPIPreferences; MPIPreferences.use_system_binary()'\n

              Check that the correct MPI library is targeted with Julia.

              julia --project -e 'using MPI; MPI.versioninfo()'\nMPIPreferences:\n  binary:  system\n  abi:     MPICH\n  libmpi:  libmpi_cray\n  mpiexec: mpiexec\n\nPackage versions\n  MPI.jl:             0.20.11\n  MPIPreferences.jl:  0.1.8\n\nLibrary information:\n  libmpi:  libmpi_cray\n  MPI version:  3.1.0\n  Library version:\n    MPI VERSION    : CRAY MPICH version 8.1.16.5 (ANL base 3.4a2)\n    MPI BUILD INFO : Mon Apr 18 12:05 2022 (git hash 4f56723)\n
              When running on the login node, switch back to the default provided MPI binaries in MPI_jll.jl by removing the LocalPreferences.toml file.

              "},{"location":"polaris/data-science-workflows/julia/#gpu-support","title":"GPU Support","text":"

              NVIDIA GPU support is provided through the CUDA.jl package.

              julia> ] add CUDA\n
              In case you want write portable GPU kernels we highly recommend the KernelAbstractions.jl package. It provides a high-level abstraction for writing GPU kernels that can be compiled for different GPU backends.
              julia> ] add KernelAbstractions\n
              By loading either oneAPI.jl, AMDGPU.jl, or CUDA.jl (see quickstart guide below).

              "},{"location":"polaris/data-science-workflows/julia/#hdf5-support","title":"HDF5 Support","text":"

              Parallel HDF5 support is provided by

              module load cray-hdf5-parallel\n
              After setting export JULIA_HDF5_PATH=$HDF5_DIR we can install the HDF5.jl package.
              julia> ] add HDF5\n

              "},{"location":"polaris/data-science-workflows/julia/#quickstart-guide","title":"Quickstart Guide","text":"

              The following example shows how to use MPI.jl, CUDA.jl, and HDF5.jl to write a parallel program that computes the sum of two vectors on the GPU and writes the result to an HDF5 file. A repository with an example code computing an approximation of pi can be found at Polaris.jl. In this repository, you will also find a setup_polaris.sh script that will build the HDF5.jl and MPI.jl package for the system libraries. The dependencies are installed with the following commands:

              julia --project\n

              julia> ] up\n
              using CUDA\nusing HDF5\nusing MPI\nusing Printf\nusing Random\n\nfunction pi_kernel(x, y, d, n)\n    idx = (blockIdx().x-1) * blockDim().x + threadIdx().x\n    if idx <= n\n        d[idx] = (x[idx] - 0.5)^2 + (y[idx] - 0.5)^2 <= 0.25 ? 1 : 0\n    end\n    return nothing\nend\n\nfunction approximate_pi_gpu(n::Integer)\n    x = CUDA.rand(Float64, n)\n    y = CUDA.rand(Float64, n)\n    d = CUDA.zeros(Float64, n)\n\n    nblocks = ceil(Int64, n/32)\n\n    @cuda threads=32 blocks=nblocks pi_kernel(x,y,d,n)\n\n    return sum(d)\nend\n\nfunction main()\n    n = 100000  # Number of points to generate per rank\n    Random.seed!(1234)  # Set a fixed random seed for reproducibility\n\n    dsum = MPI.Allreduce(approximate_pi_gpu(n), MPI.SUM, MPI.COMM_WORLD)\n\n    pi_approx = (4 * dsum) / (n * MPI.Comm_size(MPI.COMM_WORLD))\n\n    if MPI.Comm_rank(MPI.COMM_WORLD) == 0\n        @printf \"Approximation of \u03c0 using Monte Carlo method: %.10f\\n\" pi_approx\n        @printf \"Error: %.10f\\n\" abs(pi_approx - \u03c0)\n    end\n    return pi_approx\nend\n\nMPI.Init()\nif !isinteractive()\n    pi_approx = main()\n    h5open(\"pi.h5\", \"w\") do file\n        write(file, \"pi\", pi_approx)\n    end\nend\n
              "},{"location":"polaris/data-science-workflows/julia/#job-submission-script","title":"Job submission script","text":"

              This example can be run on Polaris with the following job submission script:

              #!/bin/bash -l\n#PBS -l select=1:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -l filesystems=home:grand\n#PBS -q debug\n#PBS -A PROJECT\n\ncd ${PBS_O_WORKDIR}\n\n# MPI example w/ 4 MPI ranks per node spread evenly across cores\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=4\nNDEPTH=8\nNTHREADS=1\nmodule load cray-hdf5-parallel\n# Put in your Julia depot path\nexport JULIA_DEPOT_PATH=MY_JULIA_DEPOT_PATH\n# Path to Julia executable. When using juliaup, it's in your julia_depot folder\nJULIA_PATH=$JULIA_DEPOT_PATH/juliaup/julia-1.9.1+0.x64.linux.gnu/bin/julia\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE=${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth julia --check-bounds=no --project pi.jl\n
              Verify that JULIA_DEPOT_PATH is set to the correct path and JULIA_PATH points to the Julia executable. When using juliaup, the Julia executable is located in the juliaup folder of your JULIA_DEPOT_PATH.

              "},{"location":"polaris/data-science-workflows/julia/#advanced-features","title":"Advanced features","text":""},{"location":"polaris/data-science-workflows/julia/#cuda-aware-mpi","title":"CUDA-aware MPI","text":"

              MPI.jl supports CUDA-aware MPI. This is enabled by setting the following environment variables

              export JULIA_CUDA_MEMORY_POOL=none\nexport MPICH_GPU_SUPPORT_ENABLED=1\nexport JULIA_MPI_PATH=$PATH_TO_CUDA_MPI # /opt/cray/pe/mpich/8.1.16/ofi/nvidia/20.7\nexport JULIA_MPI_HAS_CUDA=1\n

              Note that MPI.jl needs to be rebuilt for the changes to take effect.

              julia --project -e 'using Pkg; Pkg.build(\"MPI\"; verbose=true)'\n
              "},{"location":"polaris/data-science-workflows/julia/#large-scale-parallelism","title":"Large-scale parallelism","text":"

              CUDA.jl uses the nvcc compiler to compile GPU kernels. This will create object files in the TEMP filesystem. Per default, the tempdir is a global directory that can lead to name clashes of the compiled kernel object files. To avoid this, we recommend setting the tempdir to a local directory on the compute node.

              export TMPDIR=/local/scratch\n

              "},{"location":"polaris/data-science-workflows/python/","title":"Python","text":""},{"location":"polaris/data-science-workflows/python/#conda","title":"Conda","text":"

              We provide prebuilt conda environments containing GPU-supported builds of torch, tensorflow (both with horovod support for multi-node calculations), jax, and many other commonly-used Python modules.

              Users can activate this environment by first loading the conda module, and then activating the base environment.

              Explicitly (either from an interactive job, or inside a job script):

              $ module load conda\n$ conda activate base\n(base) $ which python3\n/soft/datascience/conda/2022-09-08/mconda3/bin/python3\n
              In one line, module load conda; conda activate. This can be performed on a compute node, as well as a login node.

              As of writing, the latest conda module on Polaris is built on Miniconda3 version 4.14.0 and contains Python 3.8.13. Future modules may contain entirely different major versions of Python, PyTorch, TensorFlow, etc.; however, the existing modules will be maintained as-is as long as feasible.

              While the shared Anaconda environment encapsulated in the module contains many of the most commonly used Python libraries for our users, you may still encounter a scenario in which you need to extend the functionality of the environment (i.e. install additional packages)

              There are two different approaches that are currently recommended.

              "},{"location":"polaris/data-science-workflows/python/#virtual-environments-via-venv","title":"Virtual environments via venv","text":"

              Creating your own (empty) virtual Python environment in a directory that is writable to you is simple:

              python3 -m venv /path/to/new/virtual/environment\n
              This creates a new folder that is fairly lightweight folder (<20 MB) with its own Python interpreter where you can install whatever packages you'd like. First, you must activate the virtual environment to make this Python interpreter the default interpreter in your shell session.

              You activate the new environment whenever you want to start using it via running the activate script in that folder:

              /path/to/new/virtual/environment/bin/activate\n

              In many cases, you do not want an empty virtual environment, but instead want to start from the conda base environment's installed packages, only adding and/or changing a few modules.

              To extend the base Anaconda environment with venv (e.g. my_env in the current directory) and inherit the base enviroment packages, one can use the --system-site-packages flag:

              module load conda; conda activate\npython -m venv --system-site-packages my_env\nsource my_env/bin/activate\n# Install additional packages here...\n
              You can always retroactively change the --system-site-packages flag state for this virtual environment by editing my_env/pyvenv.cfg and changing the value of the line include-system-site-packages = false.

              To install a different version of a package that is already installed in the base environment, you can use:

              pip install --ignore-installed  ... # or -I\n
              The shared base environment is not writable, so it is impossible to remove or uninstall packages from it. The packages installed with the above pip command should shadow those installed in the base environment.

              "},{"location":"polaris/data-science-workflows/python/#cloning-the-base-anaconda-environment","title":"Cloning the base Anaconda environment","text":"

              If you need more flexibility, you can clone the conda environment into a custom path, which would then allow for root-like installations via conda install <module> or pip install <module>. Unlike the venv approach, using a cloned Anaconda environment requires you to copy the entirety of the base environment, which can use significant storage space.

              This can be performed by:

              $ module load conda\n$ conda activate base\n(base) $ conda create --clone base --prefix /path/to/envs/base-clone\n(base) $ conda activate /path/to/envs/base-clone\n(base-clone) $ which python3\n/path/to/base-clone/bin/python3\n
              The cloning process can be quite slow.

              Warning

              In the above commands, path/to/envs/base-clone should be replaced by a suitably chosen path.

              "},{"location":"polaris/data-science-workflows/python/#using-pip-install-user-not-recommended","title":"Using pip install --user (not recommended)","text":"

              With the conda environment setup, one can install common Python modules using pip install --users <module-name> which will install packages in $PYTHONUSERBASE/lib/pythonX.Y/site-packages. The $PYTHONUSERBASE environment variable is automatically set when you load the base conda module, and is equal to /home/$USER/.local/polaris/conda/YYYY-MM-DD.

              Note, Python modules installed this way that contain command line binaries will not have those binaries automatically added to the shell's $PATH. To manually add the path:

              export PATH=$PYTHONUSERBASE/bin:$PATH\n
              Be sure to remove this location from $PATH if you deactivate the base Anaconda environment or unload the module.

              Cloning the Anaconda environment, or using venv are both more flexible and transparent when compared to --user installs.

              "},{"location":"polaris/data-science-workflows/applications/gpt-neox/","title":"Instructions for gpt-neox:","text":"

              We include below a set of instructions to get EleutherAI/gpt-neox running on Polaris.

              A batch submission script for the following example is available here.

              Warning

              The instructions below should be ran directly from a compute node.

              Explicitly, to request an interactive job (from polaris-login):

              $ qsub -A <project> -q debug-scaling -l select=2 -l walltime=01:00:00\n

              Refer to job scheduling and execution for additional information.

              1. Load and activate the base conda environment:

                module load conda\nconda activate base\n

              2. We've installed the requirements for running gpt-neox into a virtual environment. To activate this environment,

                source /soft/datascience/venvs/polaris/2022-09-08/bin/activate\n

              3. Clone the EleutherAI/gpt-neox repository if it doesn't already exist:

                git clone https://github.com/EleutherAI/gpt-neox\n

              4. Navigate into the gpt-neox directory:

                cd gpt-neox\n

                Note

                The remaining instructions assume you're inside the gpt-neox directory

              5. Create a DeepSpeed compliant hostfile (each line is formatted as hostname, slots=N):

                cat $PBS_NODEFILE > hostfile\nsed -e 's/$/ slots=4/' -i hostfile\nexport DLTS_HOSTFILE=hostfile \n

              6. Create a .deepspeed_env file to ensure a consistent environment across all workers

                echo \"PATH=${PATH} > .deepspeed_env\"\necho \"LD_LIBRARY_PATH=${LD_LIBRARY_PATH} >> .deepspeed_env\"\necho \"http_proxy=${http_proxy} >> .deepspeed_env\"\necho \"https_proxy=${https_proxy} >> .deepspeed_env\"\n

              7. Prepare data:

                python3 prepare_data.py -d ./data\n

              8. Train:

                python3 ./deepy.py train.py -d configs small.yml local_setup.yml\n

              Danger

              If your training seems to be getting stuck at

              Using /home/user/.cache/torch_extensions as PyTorch extensions root...\n

              there may be a leftover .lock file from an aborted build. Cleaning either the whole .cache or the extensions' sub-directory should force a clean build on the next attempt.

              "},{"location":"polaris/data-science-workflows/applications/megatron-deepspeed/","title":"Megatron-DeepSpeed","text":"

              We describe below the instructions for launching distributed training with Microsoft's Megatron-DeepSpeed and briefly describe some parallelism strategies and various optimizations that are supported.

              Note

              We maintain a forked version at argonne-lcf/Megatron-DeepSpeed that has some helper scripts for launching and setting various training options.

              "},{"location":"polaris/data-science-workflows/applications/megatron-deepspeed/#setup","title":"Setup","text":"
              1. Load conda and activate base environment:

                # load conda + activate base env\nmodule load conda/2023-10-04 ; conda activate base\n
              2. Clone argonne-lcf/Megatron-DeepSpeed and navigate into it:

                # clone + navigate into Megatron-DeepSpeed repo\ngit clone https://github.com/argonne-lcf/Megatron-DeepSpeed\ncd Megatron-DeepSpeed\n
              3. Make virtual environment (on top of base conda):

                # make virtual environment (on top of base conda)\nmkdir -p venvs/polaris/2023-10-04\npython3 -m venv venvs/polaris/2023-10-04 --system-site-packages\nsource venvs/polaris/2023-10-04/bin/activate\n
              4. Install missing dependency:

                # install *missing dependency\npython3 -m pip install \"git+https://github.com/saforem2/ezpz\"\n
              5. Launch training:

                # ---- launch training -----------------------\n# - MODEL_SIZE_KEY: defined in ALCF/model.sh\n# - other args: defined in ALCF/args.sh\n# ---------------------------------------------\nMODEL_SIZE_KEY=\"GPT25B\" \\\n    SEQ_LEN=4096 \\ \n    USE_FLASH_ATTN_V2=1 \\\n    MICRO_BATCH=1 \\\n    GAS=1 \\\n    SP_TYPE=\"megatron\" \\\n    ZERO_STAGE=1 \\\n    ./ALCF/train-gpt3.sh\n
              "},{"location":"polaris/data-science-workflows/applications/megatron-deepspeed/#helper-scripts","title":"Helper Scripts","text":"ALCF/train-gpt3.sh

              Main entry point for training. This script will automatically source the rest of the required ALCF/*.sh scripts below

              ALCF/model.sh

              Contains some example model architectures for GPT3-style models

              ALCF/args.sh

              Logic for parsing / setting up runtime options for Megatron and DeepSpeed.

              ALCF/setup.sh

              Locate and activate virtual environment to be used, ensure MPI variables are set properly

              ALCF/launch.sh

              Identify available resources and build the command to be ran i.e. figure out how many: {nodes, GPUs per node, GPUs total}, to pass to mpi{run,exec} then, use this to build mpiexec <mpiexec-args> python3 pretrain_gpt.py

              "},{"location":"polaris/data-science-workflows/containers/containers/","title":"Containers on Polaris","text":"

              Since Polaris is using NVIDIA A100 GPUs, there can be portability advantages with other NVIDIA-based systems if your workloads use containers. In this document, we'll outline some information about containers on Polaris including how to build custom containers, how to run containers at scale, and common gotchas.

              Container creation can be achieved one of two ways either by using Docker on your local machine as mentioned in Docker section of Theta(KNL) and publishing it to DockerHub, or by using a Singularity recipe file and building on a Polaris worker node. If you are not interested in building a container and only want to use the available containers, you can read the section on available containers.

              "},{"location":"polaris/data-science-workflows/containers/containers/#singularity","title":"Singularity","text":"

              The container system on Polaris is singularity. You can set up singularity with a module (this is different than, for example, ThetaGPU!):

              # To see what versions of singularity are available:\nmodule avail singularity\n\n# To load the Default version:\nmodule load singularity\n\n# To load a specific version:\nmodule load singularity/3.8.7 # the default at the time of writing these docs.\n
              "},{"location":"polaris/data-science-workflows/containers/containers/#which-singularity","title":"Which Singularity?","text":"

              There used to be a single singularity tool, which in 2021 split after some turmoil. There are now two singularitys: one developed by Sylabs, and the other as part of the Linux Foundation. Both are open source, and the split happened around version 3.10. The version on Polaris is from Sylabs but for completeness, here is the Linux Foundation's version. Note that the Linux Foundation version is renamed to apptainer - different name, roughly the same thing though divergence may happen after 2021's split.

              "},{"location":"polaris/data-science-workflows/containers/containers/#build-from-docker-images-or-argonne-github-container-registry","title":"Build from Docker Images or Argonne Github container registry","text":"

              Docker containers require root privileges, which users do not have on Polaris. That doesn't mean all your docker containers aren't useful, though. If you have an existing docker container, you can convert it to singularity pretty easily on the login node. To build the latest NVIDIA container for PyTorch you can run the following:

              module load singularity\nsingularity build pytorch:22.06-py3.sing docker://nvcr.io/nvidia/pytorch:22.06-py3\n

              Note that latest here mean when these docs were written, summer 2022. It may be useful to get a newer container if you need the latest features. You can find the PyTorch container site here. The tensorflow containers are here (though note that LCF doesn't prebuild the TF-1 containers typically). You can search the full container registry here.

              You can also use our custom built containers using Github OCI container registry. Here's a list of containers distributed by ALCF staff tailored for Polaris.

              module load singularity\nsingularity pull IMAGE_NAME oras://ghcr.io/argonne-lcf/IMAGE_NAME:latest\n
              "},{"location":"polaris/data-science-workflows/containers/containers/#build-with-a-recipe","title":"Build with a Recipe","text":"

              You can also build a singularity container using a recipe file. Detailed instructions for recipe construction are available on the Singularity Recipe Page. You can also check our singularity recipe example for building a mpich version 4 container on Polaris.

              Once you have a recipe file, you can build it on Polaris, but only on compute nodes. You can launch an interactive job using the attribute singularity_fakeroot=true to build on a compute node.

              qsub -I -A <project_name> -q <queue> -l select=1 -l walltime=60:00 -l singularity_fakeroot=true -l filesystems=home:eagle:grand\n

              You need to replace the <project_name> with the appropriate project to charge and <queue> with debug, or preemptable queues since we only request a single node.

              After your interactive job has started, you need to load the singularity module on the compute node and export the proxy variables for internet access. Then you can build the container as shown below.

              module load singularity\nexport HTTP_PROXY=http://proxy.alcf.anl.gov:3128\nexport HTTPS_PROXY=http://proxy.alcf.anl.gov:3128\nexport http_proxy=http://proxy.alcf.anl.gov:3128\nexport https_proxy=http://proxy.alcf.anl.gov:3128\nsingularity build --fakeroot <image_name>.sif <def_filename>.def \n

              Alternatively, you can just pull the mpich 4 image distributed by us and build on top of it

              singularity pull oras://ghcr.io/argonne-lcf/mpich-4:latest\n
              "},{"location":"polaris/data-science-workflows/containers/containers/#running-singularity-container-on-polaris","title":"Running Singularity container on Polaris","text":""},{"location":"polaris/data-science-workflows/containers/containers/#example-submission-script-on-polaris","title":"Example submission script on Polaris","text":"

              To run a container on Polaris you can use the submission script described here. Below we have described the submission script for your understanding.

              First we define our job and our script takes the container name as an input parameter.

              #!/bin/sh\n#PBS -l select=2:system=polaris\n#PBS -q debug\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -l filesystems=home:grand\n#PBS -A <project_name>\ncd ${PBS_O_WORKDIR}\necho $CONTAINER\n

              We move to current working directory and enable network access at run time by setting the proxy. We also load singularity.

              # SET proxy for internet access\nmodule load singularity\nexport HTTP_PROXY=http://proxy.alcf.anl.gov:3128\nexport HTTPS_PROXY=http://proxy.alcf.anl.gov:3128\nexport http_proxy=http://proxy.alcf.anl.gov:3128\nexport https_proxy=http://proxy.alcf.anl.gov:3128\n

              This is important for system (Polaris - Cray) mpich to bind to containers mpich. Set the following environment variables

              ADDITIONAL_PATH=/opt/cray/pe/pals/1.1.7/lib/\nmodule load cray-mpich-abi\nexport SINGULARITYENV_LD_LIBRARY_PATH=\"$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH:$ADDITIONAL_PATH\"\n

              Set the number of ranks per node spread as per your scaling requirements

              # MPI example w/ 16 MPI ranks per node spread evenly across cores\nNODES=`wc -l < $PBS_NODEFILE`\nPPN=16\nPROCS=$((NODES * PPN))\necho \"NUM_OF_NODES= ${NODES} TOTAL_NUM_RANKS= ${PROCS} RANKS_PER_NODE= ${PPN}\"\n

              Finally launch your script

              echo C++ MPI\nmpiexec -hostfile $PBS_NODEFILE -n $PROCS -ppn $PPN singularity exec -B /opt -B /var/run/palsd/ $CONTAINER /usr/source/mpi_hello_world\n\necho Python MPI\nmpiexec -hostfile $PBS_NODEFILE -n $PROCS -ppn $PPN singularity exec -B /opt -B /var/run/palsd/ $CONTAINER python3 /usr/source/mpi_hello_world.py\n

              The job can be submitted using:

              qsub -v CONTAINER=mpich-4_latest.sif job_submission.sh\n
              "},{"location":"polaris/data-science-workflows/containers/containers/#available-containers","title":"Available containers","text":"

              If you just want to know what containers are available, here you go.

              • For running mpich/MPI containers on Polaris, it can be found here
              • For running databases on Polaris. It can be found here
              • For using shpc - that allows for running containers as modules. It can be found here
              • Some containers are found in /soft/containers

              The latest containers are updated periodically. If you have trouble using containers, or request a newer or a different container please contact ALCF support at support@alcf.anl.gov.

              "},{"location":"polaris/data-science-workflows/containers/containers/#troubleshooting","title":"Troubleshooting","text":"
              1. Permission Denied Error: One may get a permission denied error during the build process, due to a nasty permission setting, quota limitations, or simply due to an unresolved symbolic link. You can try one of the solutions below:

                • Check your quota and delete any unnecessary files.
                • Clean-up singularity cache, ~/.singularity/cache, and set the singularity tmp and cache directories as below:
                  export SINGULARITY_TMPDIR=/tmp/singularity-tmpdir\nmkdir $SINGULARITY_TMPDIR\nexport SINGULARITY_CACHEDIR=/tmp/singularity-cachedir/\nmkdir $SINGULARITY_CACHEDIR\n
                • Make sure you are not on a directory accessed with a symlink, i.e. check if pwd and pwd -P returns the same path.
                • If any of the above doesn't work, try running the build in your home directory.
              2. Mapping to rank 0 on all nodes: This is mainly due to container mpich not binding to system mpich. It is imperative for the container to have mpich which can bind dynamically to system mpich at runtime. Ensure your submission script has the following variables and modules loaded (see below). If this does not resolve, ensure the containers mpich is built with the '--disable-wrapper-rpath' flag. Please refer to this link to find examples of building a mpich based container from scratch and running on Polaris.

              ADDITIONAL_PATH=/opt/cray/pe/pals/1.1.7/lib/\nmodule load cray-mpich-abi\nexport SINGULARITYENV_LD_LIBRARY_PATH=\"$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH:$ADDITIONAL_PATH\"\nsingularity exec -B /opt -B /var/run/palsd/\n
              1. libmpi.so.40 not found: This may be due to mpich binding to the wrong system mpich. Try removing .conda & .cache & .local folders from your home directory. Also rebuild your container and try again.

              2. Containers built with openmpi may not work correctly. Please ensure your container is built with mpich and the base image is of Debian architecture (For e.g. Ubuntu) image.

              "},{"location":"polaris/data-science-workflows/frameworks/deepspeed/","title":"DeepSpeed","text":"

              The base conda environment on Polaris comes with Microsoft's DeepSpeed pre-installed. Instructions for using / cloning the base environment can be found here.

              A batch submission script for the following example is available here.

              We describe below the steps needed to get started with DeepSpeed on Polaris.

              We focus on the cifar example provided in the DeepSpeedExamples repository, though this approach should be generally applicable for running any model with DeepSpeed support.

              "},{"location":"polaris/data-science-workflows/frameworks/deepspeed/#running-deepspeed-on-polaris","title":"Running DeepSpeed on Polaris","text":"

              Note

              The instructions below should be ran directly from a compute node.

              Explicitly, to request an interactive job (from polaris-login):

              qsub -A <project> -q debug-scaling -l select=2 -l walltime=01:00:00 -I\n

              Refer to job scheduling and execution for additional information.

              1. Load conda module and activate base environment:

                module load conda ; conda activate base\n
              2. Clone microsoft/DeepSpeedExamples and navigate into the directory:

                git clone https://github.com/microsoft/DeepSpeedExamples.git\ncd DeepSpeedExamples/cifar\n

              Launching DeepSpeed

              Launching with MPICHLaunching with DeepSpeed
              1. Get total number of available GPUs:

                1. Count number of lines in $PBS_NODEFILE (1 host per line)
                2. Count number of GPUs available on current host
                3. NGPUS=\"$((${NHOSTS}*${NGPU_PER_HOST}))\"
                  NHOSTS=$(wc -l < \"${PBS_NODEFILE}\")\nNGPU_PER_HOST=$(nvidia-smi -L | wc -l)\nNGPUS=\"$((${NHOSTS}*${NGPU_PER_HOST}))\"\n
              2. Launch with mpiexec:

                mpiexec \\\n  --verbose \\\n  --envall \\\n  -n \"${NGPUS}\" \\\n  --ppn \"${NGPU_PER_HOST}\" \\\n  --hostfile=\"${PBS_NODEFILE}\" \\\n  python3 \\\n    cifar10_deepspeed.py \\\n    --deepspeed_config ds_config.json\n

              1. Create a DeepSpeed compliant hostfile, specifying the hostname and number of GPUs (slots) for each of our available workers:

                cat $PBS_NODEFILE > hostfile\nsed -e 's/$/ slots=4/' -i hostfile\n

              2. Create a .deepspeed_env containing the environment variables our workers will need access to:

                echo \"PATH=${PATH}\" >> .deepspeed_env\necho \"LD_LIBRARY_PATH=${LD_LIBRARY_PATH}\" >> .deepspeed_env\necho \"http_proxy=${http_proxy}\" >> .deepspeed_env\necho \"https_proxy=${https_proxy}\" >> .deepspeed_env\n

              Warning

              The .deepspeed_env file expects each line to be of the form KEY=VALUE. Each of these will then be set as environment variables on each available worker specified in our hostfile.

              We can then run the cifar10_deepspeed.py module using DeepSpeed:

              deepspeed --hostfile=hostfile cifar10_deepspeed.py \\\n    --deepspeed \\\n    --deepspeed_config ds_config.json\n

              AssertionError: Micro batch sizer per gpu: 0 has to be greater than 0

              Depending on the details of your specific job, it may be necessary to modify the provided ds_config.json.

              If you encounter an error:

              x3202c0s31b0n0: AssertionError: Micro batch size per gpu: 0 has to be greater than 0\n
              you can modify the \"train_batch_size\": 16 variable in the provided ds_config.json to the (total) number of available GPUs, and explicitly set \"gradient_accumulation_steps\": 1, as shown below.
              $ export NHOSTS=$(wc -l < \"${PBS_NODEFILE}\")\n$ export NGPU_PER_HOST=$(nvidia-smi -L | wc -l)\n$ export NGPUS=\"$((${NHOSTS}*${NGPU_PER_HOST}))\"\n$ echo $NHOSTS $NGPU_PER_HOST $NGPUS\n24 4 96\n$ # replace \"train_batch_size\" with $NGPUS in ds_config.json\n$ # and write to `ds_config-polaris.json`\n$ sed \\\n    \"s/$(cat ds_config.json| grep batch | cut -d ':' -f 2)/ ${NGPUS},/\" \\\n    ds_config.json \\\n    > ds_config-polaris.json\n$ cat ds_config-polaris.json\n{\n    \"train_batch_size\": 96,\n    \"gradient_accumulation_steps\": 1,\n    ...\n}\n

              "},{"location":"polaris/data-science-workflows/frameworks/jax/","title":"JAX","text":"

              JAX is another popular python package for accelerated computing. JAX is built on XLA (the same XLA TensorFlow uses) as well as AutoGrad, and additionally has acceleration tools that operate on functions such as vmap, jit, etc. JAX is not as widespread in machine learning as TensorFlow and PyTorch for traditional models (Computer Vision, Language Models) though it is quickly gaining promienence. JAX is very powerful when a program needs non-traditional autodifferentiation or vectorizatoin, such as: forward-mode AD, higher order derivatives, Jacobians, Hessians, or any combination of the above. Users of JAX on Polaris are encouraged to read the user documentation in detail, particularly the details about pure-functional programming, no in-place operations, and the common mistakes in writing functions for the @jit decorator.

              "},{"location":"polaris/data-science-workflows/frameworks/jax/#jax-on-polaris","title":"JAX on Polaris","text":"

              JAX is installed on Polaris via the conda module, available with:

              module load conda; conda activate\n

              Then, you can load JAX in python as usual (below showing results from the conda/2022-07-19 module):

              >>> import jax\n>>> jax.__version__\n'0.3.15'\n>>>\n
              "},{"location":"polaris/data-science-workflows/frameworks/jax/#notes-on-jax-0315","title":"Notes on JAX 0.3.15","text":"

              On Polaris, due to a bug, an environment variable must be set to use JAX on GPUs. The following code will crash:

              import jax.numpy as numpy\na = numpy.zeros(1000)\n
              outputting an error that looks like:
              jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: no kernel image is available for execution on the device\n

              You can fix this by setting an environment variable:

              export XLA_FLAGS=\"--xla_gpu_force_compilation_parallelism=1\"\n

              "},{"location":"polaris/data-science-workflows/frameworks/jax/#scaling-jax-to-multiple-gpus-and-multiple-nodes","title":"Scaling JAX to multiple GPUs and multiple Nodes","text":"

              Jax has intrinsic scaling tools to use multiple GPUs on a single node, via the pmap function. If this is sufficient for your needs, excellent. If not, another alternative is to use the newer package mpi4jax.

              mpi4Jax is a relatively new project and requires setting some environment variables for good performance and usability: - Set MPI4JAX_USE_CUDA_MPI=1 to use CUDA-Aware MPI, supported in the conda module, to do operations directly from the GPU. - Set MPICH_GPU_SUPPORT_ENABLED=1 to use CUDA-Aware MPI.

              The following code, based off of a test script from the mpi4jax repository, can help you verify you are using mpi4jax properly:

              import os\nfrom mpi4py import MPI\nimport jax\nimport jax.numpy as jnp\nimport mpi4jax\n\ncomm = MPI.COMM_WORLD\nrank = comm.Get_rank()\nlocal_rank = int(os.environ[\"PMI_LOCAL_RANK\"])\n\navailable_devices = jax.devices(\"gpu\")\nif len(available_devices) <= local_rank:\n    raise Exception(\"Could not find enough GPUs\")\n\ntarget_device = available_devices[local_rank]\n\n\n@jax.jit\ndef foo(arr):\n   arr = arr + rank\n   arr_sum, _ = mpi4jax.allreduce(arr, op=MPI.SUM, comm=comm)\n   return arr_sum\n\nwith jax.default_device(target_device):\n    a = jnp.zeros((3, 3))\n    print(f\"Rank {rank}, local rank {local_rank}, a.device is {a.device()}\")\n    result = foo(a)\n    print(f\"Rank {rank}, local rank {local_rank}, result.device is {result.device()}\")\n\n    import time\n    print(\"Sleeping for 5 seconds if you want to look at nvidia-smi ... \")\n    import time\n    time.sleep(5)\n    print(\"Done sleeping\")\n\nif rank == 0:\n   print(result)\n

              JAX and mpi4jax are both still somewhat early in their software lifecycles. Updates are frequent, and if you require assistance please contact support@alcf.anl.gov.

              "},{"location":"polaris/data-science-workflows/frameworks/pytorch/","title":"PyTorch on Polaris","text":"

              PyTorch is a popular, open source deep learning framework developed and released by Facebook. The PyTorch home page has more information about PyTorch, which you can refer to. For trouble shooting on Polaris, please contact support@alcf.anl.gov.

              "},{"location":"polaris/data-science-workflows/frameworks/pytorch/#installation-on-polaris","title":"Installation on Polaris","text":"

              PyTorch is installed on Polaris already, available in the conda module. To use it from a compute node, please do:

              module load conda\nconda activate\n

              Then, you can load PyTorch in python as usual (below showing results from the conda/2022-07-19 module):

              >>> import torch\n>>> torch.__version__\n'1.12.0a0+git67ece03'\n>>>\n

              This installation of PyTorch was built from source and the cuda libraries it uses are found via the CUDA_HOME environment variable (below showing results from the conda/2022-07-19 module):

              $ echo $CUDA_HOME\n/soft/datascience/cuda/cuda_11.5.2_495.29.05_linux\n

              If you need to build applications that use this version of PyTorch and CUDA, we recommend using these cuda libraries to ensure compatibility. We periodically update the PyTorch release, though updates will come in the form of new versions of the conda module.

              PyTorch is also available through nvidia containers that have been translated to Singularity containers. For more information about containers, please see the containers documentation page.

              "},{"location":"polaris/data-science-workflows/frameworks/pytorch/#pytorch-best-practices-on-polaris","title":"PyTorch Best Practices on Polaris","text":""},{"location":"polaris/data-science-workflows/frameworks/pytorch/#single-node-performance","title":"Single Node Performance","text":"

              When running PyTorch applications, we have found the following practices to be generally, if not universally, useful and encourage you to try some of these techniques to boost performance of your own applications.

              1. Use Reduced Precision. Reduced Precision is available on A100 via tensorcores and is supported with PyTorch operations. In general, the way to do this is via the PyTorch Automatic Mixed Precision package (AMP), as descibed in the mixed precision documentation. In PyTorch, users generally need to manage casting and loss scaling manually, though context managers and function decorators can provide easy tools to do this.

              2. PyTorch has a JIT module as well as backends to support op fusion, similar to TensorFlow's tf.function tools. However, PyTorch JIT capabilities are newer and may not yield performance improvements. Please see TorchScript for more information.

              "},{"location":"polaris/data-science-workflows/frameworks/pytorch/#multi-gpu-multi-node-scale-up","title":"Multi-GPU / Multi-Node Scale up","text":"

              PyTorch is compatible with scaling up to multiple GPUs per node, and across multiple nodes. Good scaling performance has been seen up to the entire Polaris system, > 2048 GPUs. Good performance with PyTorch has been seen with both DDP and Horovod. For details, please see the Horovod documentation or the Distributed Data Parallel documentation. Some Polaris-specific details that may be helpful to you:

              1. CPU affinity and NCCL settings can improve scaling performance, particularly at the largest scales. In particular, we encourage users to try their scaling measurements with the following settings:
              2. Set the environment variable NCCL_COLLNET_ENABLE=1
              3. Set the environment varialbe NCCL_NET_GDR_LEVEL=PHB
              4. Manually set the CPU affinity via mpiexec, such as with --cpu-bind verbose,list:0,8,16,24

              5. Horovod and DDP work best when you limit the visible devices to only one GPU. Note that if you import mpi4py or horovod, and then do something like os.environ[\"CUDA_VISIBLE_DEVICES\"] = hvd.local_rank(), it may not actually work! You must set the CUDA_VISIBLE_DEVICES environment variable prior to doing MPI.COMM_WORLD.init(), which is done in horovod.init() as well as implicitly in from mpi4py import MPI. On Polaris specifically, you can use the environment variable PMI_LOCAL_RANK (as well as PMI_LOCAL_SIZE) to learn information about the node-local MPI ranks.

              "},{"location":"polaris/data-science-workflows/frameworks/pytorch/#deepspeed","title":"DeepSpeed","text":"

              DeepSpeed is also available and usable on Polaris. For more information, please see the DeepSpeed documentation directly.

              "},{"location":"polaris/data-science-workflows/frameworks/pytorch/#pytorch-dataloader-and-multi-node-horovod","title":"PyTorch DataLoader and multi-node Horovod","text":"

              Please note there is a bug that causes a hang when using PyTorch's multithreaded data loaders with distributed training across multiple nodes. To workaround this, NVIDIA recommends setting num_workers=0 in the dataloader configuration, which serializes data loading.

              For more details, see Polaris Known Issues.

              "},{"location":"polaris/data-science-workflows/frameworks/tensorflow/","title":"TensorFlow on Polaris","text":"

              TensorFlow is a popular, open-source deep learning framework developed and released by Google. The TensorFlow home page has more information about TensorFlow, which you can refer to. For trouble shooting on Polaris, please contact support@alcf.anl.gov.

              "},{"location":"polaris/data-science-workflows/frameworks/tensorflow/#installation-on-polaris","title":"Installation on Polaris","text":"

              TensorFlow is already pre-installed on Polaris, available in the conda module. To use it from a compute node, please do:

              module load conda\nconda activate\n

              Then, you can load TensorFlow in python as usual (below showing results from the conda/2022-07-19 module):

              >>> import tensorflow as tf\n>>> tf.__version__\n'2.9.1'\n>>>\n

              This installation of TensorFlow was built from source and the CUDA libraries it uses are found via the CUDA_HOME environment variable (below showing results from the conda/2022-07-19 module):

              $ echo $CUDA_HOME\n/soft/datascience/cuda/cuda_11.5.2_495.29.05_linux\n

              If you need to build applications that use this version of TensorFlow and CUDA, we recommend using these cuda libraries to ensure compatibility. We periodically update the TensorFlow release, though updates will come in the form of new versions of the conda module.

              TensorFlow is also available through NVIDIA containers that have been translated to Singularity containers. For more information about containers, please see the Containers documentation page.

              "},{"location":"polaris/data-science-workflows/frameworks/tensorflow/#tensorflow-best-practices-on-polaris","title":"TensorFlow Best Practices on Polaris","text":""},{"location":"polaris/data-science-workflows/frameworks/tensorflow/#single-node-performance","title":"Single Node Performance","text":"

              When running TensorFlow applications, we have found the following practices to be generally, if not universally, useful and encourage you to try some of these techniques to boost performance of your own applications.

              1. Use Reduced Precision. Reduced Precision is available on A100 via tensorcores and is supported with TensorFlow operations. In general, the way to do this is via the tf.keras.mixed_precision Policy, as descibed in the mixed precision documentation. If you use a custom training loop (and not keras.Model.fit), you will also need to apply loss scaling.

              2. Use TensorFlow's graph API to improve efficiency of operations. TensorFlow is, in general, an imperative language but with function decorators like @tf.function you can trace functions in your code. Tracing replaces your python function with a lower-level, semi-compiled TensorFlow Graph. More information about the tf.function interface is available here. When possible, use jit_compile, but be aware of sharp bits when using tf.function: python expressions that aren't tensors are often replaced as constants in the graph, which may or may not be your intention.

              3. Use XLA compilation on your code. XLA is the Accelerated Linear Algebra library that is available in tensorFlow and critical in software like JAX. XLA will compile a tf.Graph object, generated with tf.function or similar, and perform optimizations like operation-fusion. XLA can give impressive performance boosts with almost no user changes except to set an environment variable TF_XLA_FLAGS=--tf_xla_auto_jit=2. If your code is complex, or has dynamically sized tensors (tensors where the shape changes every iteration), XLA can be detrimental: the overhead for compiling functions can be large enough to mitigate performance improvements. XLA is particularly powerful when combined with reduced precision, yielding speedups > 100% in some models.

              "},{"location":"polaris/data-science-workflows/frameworks/tensorflow/#multi-gpu-multi-node-scale-up","title":"Multi-GPU / Multi-Node Scale up","text":"

              TensorFlow is compatible with scaling up to multiple GPUs per node, and across multiple nodes. Good scaling performance has been seen up to the entire Polaris system, > 2048 GPUs. Good performance with tensorFlow has been seen with horovod in particular. For details, please see the Horovod documentation. Some polaris specific details that may be helpful to you:

              1. CPU affinity and NCCL settings can improve scaling performance, particularly at the largest scales. In particular, we encourage users to try their scaling measurements with the following settings:
              2. Set the environment variable NCCL_COLLNET_ENABLE=1
              3. Set the environment varialbe NCCL_NET_GDR_LEVEL=PHB
              4. Manually set the CPU affinity via mpiexec, such as with --cpu-bind verbose,list:0,8,16,24

              5. Horovod works best when you limit the visible devices to only one GPU. Note that if you import mpi4py or horovod, and then do something like os.environ[\"CUDA_VISIBLE_DEVICES\"] = hvd.local_rank(), it may not actually work! You must set the CUDA_VISIBLE_DEVICES environment variable prior to doing MPI.COMM_WORLD.init(), which is done in horovod.init() as well as implicitly in from mpi4py import MPI. On Polaris specifically, you can use the environment variable PMI_LOCAL_RANK (as well as PMI_LOCAL_SIZE) to learn information about the node-local MPI ranks.

              "},{"location":"polaris/data-science-workflows/frameworks/tensorflow/#tensorflow-dataloaders","title":"TensorFlow Dataloaders","text":"

              Additional information to be provided.

              "},{"location":"polaris/debugging-tools/CUDA-GDB/","title":"CUDA-GDB","text":""},{"location":"polaris/debugging-tools/CUDA-GDB/#references","title":"References","text":"

              NVIDIA CUDA-GDB Documentation

              "},{"location":"polaris/debugging-tools/CUDA-GDB/#introduction","title":"Introduction","text":"

              CUDA-GDB is the NVIDIA tool for debugging CUDA applications running on Polaris. CUDA-GDB is an extension to GDB, the GNU Project debugger. The tool provides developers with a mechanism for debugging CUDA applications running on actual hardware. This enables developers to debug applications without the potential variations introduced by simulation and emulation environments.

              "},{"location":"polaris/debugging-tools/CUDA-GDB/#step-by-step-guide","title":"Step-by-step guide","text":""},{"location":"polaris/debugging-tools/CUDA-GDB/#debug-compilation","title":"Debug Compilation","text":"

              NVCC, the NVIDIA CUDA compiler driver, provides a mechanism for generating the debugging information necessary for CUDA-GDB to work properly. The -g -G option pair must be passed to NVCC when an application is compiled for ease of debugging with CUDA-GDB; for example,

              nvcc -g -G foo.cu -o foo\n
              Using this line to compile the CUDA application foo.cu * forces -O0 compilation, with the exception of very limited dead-code eliminations and register-spilling optimizations. * makes the compiler include debug information in the executable

              "},{"location":"polaris/debugging-tools/CUDA-GDB/#running-cuda-gdb-on-polaris-compute-nodes","title":"Running CUDA-gdb on Polaris compute nodes","text":"

              Start an interactive job mode on Polaris as follows:

              $ qsub -I -l select=1 -l walltime=1:00:00\n\n$ cuda-gdb --version\nNVIDIA (R) CUDA Debugger\n11.4 release\nPortions Copyright (C) 2007-2021 NVIDIA Corporation\nGNU gdb (GDB) 10.1\nCopyright (C) 2020 Free Software Foundation, Inc.\nLicense GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\n\n$ cuda-gdb foo\n

              "},{"location":"polaris/debugging-tools/CUDA-GDB/#a-quick-example-with-a-stream-benchmark-on-a-polaris-compute-node","title":"A quick example with a stream benchmark on a Polaris compute node","text":"
              jkwack@polaris-login-02:~> qsub -I -l select=1 -l walltime=1:00:00\nqsub: waiting for job 308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov to start\nqsub: job 308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov ready\n\n\nCurrently Loaded Modules:\n  1) craype-x86-rome          4) perftools-base/22.05.0   7) cray-dsmml/0.2.2   10) cray-pmi-lib/6.0.17  13) PrgEnv-nvhpc/8.3.3\n  2) libfabric/1.11.0.4.125   5) nvhpc/21.9               8) cray-mpich/8.1.16  11) cray-pals/1.1.7      14) craype-accel-nvidia80\n  3) craype-network-ofi       6) craype/2.7.15            9) cray-pmi/6.1.2     12) cray-libpals/1.1.7\n\njkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> nvcc -g -G -c ../src/cuda/CUDAStream.cu  -I ../src/\n\njkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> nvcc -g -G -c ../src/main.cpp -DCUDA -I ../src/cuda/ -I ../src/\n\njkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> nvcc -g -G main.o CUDAStream.o -o cuda-stream-debug\n\njkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> ./cuda-stream-debug \nBabelStream\nVersion: 4.0\nImplementation: CUDA\nRunning kernels 100 times\nPrecision: double\nArray size: 268.4 MB (=0.3 GB)\nTotal size: 805.3 MB (=0.8 GB)\nUsing CUDA device NVIDIA A100-SXM4-40GB\nDriver: 11040\nFunction    MBytes/sec  Min (sec)   Max         Average     \nCopy        1313940.694 0.00041     0.00047     0.00047     \nMul         1302000.791 0.00041     0.00048     0.00047     \nAdd         1296217.720 0.00062     0.00070     0.00069     \nTriad       1296027.887 0.00062     0.00070     0.00069     \nDot         823405.227  0.00065     0.00076     0.00075     \n\njkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> cuda-gdb ./cuda-stream-debug \nNVIDIA (R) CUDA Debugger\n11.4 release\nPortions Copyright (C) 2007-2021 NVIDIA Corporation\nGNU gdb (GDB) 10.1\nCopyright (C) 2020 Free Software Foundation, Inc.\nLicense GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\nType \"show copying\" and \"show warranty\" for details.\nThis GDB was configured as \"x86_64-pc-linux-gnu\".\nType \"show configuration\" for configuration details.\nFor bug reporting instructions, please see:\n<https://www.gnu.org/software/gdb/bugs/>.\nFind the GDB manual and other documentation resources online at:\n    <http://www.gnu.org/software/gdb/documentation/>.\n\nFor help, type \"help\".\nType \"apropos word\" to search for commands related to \"word\"...\nReading symbols from ./cuda-stream-debug...\n(cuda-gdb) b CUDAStream.cu:203\nBreakpoint 1 at 0x412598: CUDAStream.cu:203. (2 locations)\n(cuda-gdb) r      \nStarting program: /home/jkwack/BabelStream/build_polaris_debug/cuda-stream-debug \n[Thread debugging using libthread_db enabled]\nUsing host libthread_db library \"/lib64/libthread_db.so.1\".\nBabelStream\nVersion: 4.0\nImplementation: CUDA\nRunning kernels 100 times\nPrecision: double\nArray size: 268.4 MB (=0.3 GB)\nTotal size: 805.3 MB (=0.8 GB)\n[Detaching after fork from child process 58459]\n[New Thread 0x15554c6bb000 (LWP 58475)]\nUsing CUDA device NVIDIA A100-SXM4-40GB\nDriver: 11040\n[New Thread 0x15554c4ba000 (LWP 58476)]\n[Switching focus to CUDA kernel 0, grid 5, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 3, lane 0]\n\nThread 1 \"cuda-stream-deb\" hit Breakpoint 1, triad_kernel<double><<<(32768,1,1),(1024,1,1)>>> (a=0x155506000000, b=0x1554f6000000, c=0x1554e6000000)\n    at ../src/cuda/CUDAStream.cu:203\n203   a[i] = b[i] + scalar * c[i];\n(cuda-gdb) c\nContinuing.\n[Switching focus to CUDA kernel 0, grid 5, block (1,0,0), thread (0,0,0), device 0, sm 0, warp 32, lane 0]\n\nThread 1 \"cuda-stream-deb\" hit Breakpoint 1, triad_kernel<double><<<(32768,1,1),(1024,1,1)>>> (a=0x155506000000, b=0x1554f6000000, c=0x1554e6000000)\n    at ../src/cuda/CUDAStream.cu:203\n203   a[i] = b[i] + scalar * c[i];\n(cuda-gdb) info locals\ni = 1024\n(cuda-gdb) p b[i]\n$1 = 0.040000000000000008\n(cuda-gdb) p scalar\n$2 = 0.40000000000000002\n(cuda-gdb) p c[i]\n$3 = 0.14000000000000001\n(cuda-gdb) d 1\n(cuda-gdb) c\nContinuing.\nFunction    MBytes/sec  Min (sec)   Max         Average     \nCopy        1314941.553 0.00041     0.00041     0.00041     \nMul         1301022.680 0.00041     0.00042     0.00041     \nAdd         1293858.147 0.00062     0.00063     0.00063     \nTriad       1297681.929 0.00062     0.00063     0.00062     \nDot         828446.963  0.00065     0.00066     0.00065     \n[Thread 0x15554c4ba000 (LWP 58476) exited]\n[Thread 0x15554c6bb000 (LWP 58475) exited]\n[Inferior 1 (process 58454) exited normally]\n(cuda-gdb) q\n\njkwack@x3008c0s13b1n0:~/BabelStream/build_polaris_debug> \n
              "},{"location":"polaris/hardware-overview/machine-overview/","title":"Polaris","text":"

              Polaris is a 560 node HPE Apollo 6500 Gen 10+ based system. Each node has a single 2.8 GHz AMD EPYC Milan 7543P 32 core CPU with 512 GB of DDR4 RAM and four NVIDIA A100 GPUs connected via NVLink, a pair of local 1.6TB of SSDs in RAID0 for the users use, and a pair of Slingshot network adapters. They are currently Slingshot 10, but are scheduled to be upgraded to Slingshot 11 in 2023. There are two nodes per chassis, seven chassis per rack, and 40 racks for a total of 560 nodes. More detailed specifications are as follows:

              "},{"location":"polaris/hardware-overview/machine-overview/#polaris-compute-nodes","title":"Polaris Compute Nodes","text":"POLARIS COMPUTE DESCRIPTION PER NODE AGGREGATE Processor (Note 1) 2.8 GHz 7543P 1 560 Cores/Threads AMD Zen 3 (Milan) 32/64 17,920/35,840 RAM (Note 2) DDR4 512 GiB 280 TiB GPUS NVIDIA A100 4 2240 Local SSD 1.6 TB 2/3.2 TB 1120/1.8PB

              Note 1: 256MB shared L3 cache, 512KB L2 cache per core, 32 KB L1 cache per core Note 2: 8 memory channels rated at 204.8 GiB/s

              "},{"location":"polaris/hardware-overview/machine-overview/#polaris-a100-gpu-information","title":"Polaris A100 GPU Information","text":"DESCRIPTION A100 PCIe A100 HGX (Polaris) GPU Memory 40 GiB HBM2 160 GiB HBM2 GPU Memory BW 1.6 TB/s 6.4 TB/s Interconnect PCIe Gen4 64 GB/s NVLink 600 GB/s FP 64 9.7 TF 38.8 TF FP64 Tensor Core 19.5 TF 78 TF FP 32 19.5 TF 78 TF BF16 Tensor Core 312 TF 1.3 PF FP16 Tensor Core 312 TF 1.3 PF INT8 Tensor Core 624 TOPS 2496 TOPS Max TDP Power 250 W 400 W"},{"location":"polaris/hardware-overview/machine-overview/#polaris-device-affinity-information","title":"Polaris Device Affinity Information","text":"CPU Affinity NUMA Affinity GPU0 GPU1 GPU2 GPU3 mlx5_0 mlx5_1 24-31,56-63 3 GPU0 X NV4 NV4 NV4 SYS SYS 16-23,48-55 2 GPU1 NV4 X NV4 NV4 SYS PHB 8-15,40-47 1 GPU2 NV4 NV4 X NV4 SYS SYS 0-7,32-39 0 GPU3 NV4 NV4 NV4 X PHB SYS mlx5_0 SYS SYS SYS PHB X SYS mlx5_1 SYS PHB SYS SYS SYS X"},{"location":"polaris/hardware-overview/machine-overview/#legend","title":"Legend:","text":"

              X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

              Links to detailed NVIDIA A100 documentation: - NVIDIA A100 Tensor Core GPU Architecture - NVIDIA Ampere Architecture In-Depth

              "},{"location":"polaris/hardware-overview/machine-overview/#login-nodes","title":"Login nodes","text":"

              There are four login nodes available to users for editing code, building code, submitting / monitoring jobs, checking usage (sbank), etc.. Their full hostnames are polaris-login-N.hsn.cm.polaris.alcf.anl.gov for N equal to 01 through 04; there are an additional two login nodes that are not user-accessible which are used for running services such as JupyterHub. The various compilers and libraries are present on the logins, so most users should be able to build their code. However, if your build requires the physical presence of the GPU, you will need to build on a compute node.

              All users share the same login nodes so please be courteous and respectful of your fellow users. For example, please do not run computationally or IO intensive pre- or post-processing on the logins and keep the parallelism of your builds to a reasonable level.

              POLARIS LOGIN DESCRIPTION PER NODE AGGREGATE Processor (Note 1) 2.0 GHz 7713 2 12 Cores/Threads AMD Zen 3 (Milan) 128/256 768/1536 RAM (Note 2) DDR4 512 GiB 3 TiB GPUs (Note 3) No GPUs 0 0 Local SSD None 0 0

              Note 1: 256MB shared L3 cache, 512KB L2 cache per core, 32 KB L1 cache per core Note 2: 8 memory channels rated at 204.8 GiB/s per socket Note 3: If your build requires the physical presence of a GPU you will need to build on a compute node.

              "},{"location":"polaris/hardware-overview/machine-overview/#gateway-nodes","title":"Gateway nodes","text":"

              There are 50 gateway nodes. These nodes are not user accessible, but are used transparently for access to the storage systems. Each node has a single 200 Gbps HDR IB card for access to the storage area network. This gives a theoretical peak bandwidth of 1250 GB/s which is approximately the aggregate bandwidth of the global file systems (1300 GB/s).

              "},{"location":"polaris/hardware-overview/machine-overview/#storage","title":"Storage","text":"

              Polaris has access to the ALCF global file systems. Details on storage can be found here.

              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/","title":"NVIDIA Nsight tools","text":""},{"location":"polaris/performance-tools/NVIDIA-Nsight/#references","title":"References","text":"

              NVIDIA Nsight Systems Documentation NVIDIA Nsight Compute Documentation

              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#introduction","title":"Introduction","text":"

              NVIDIA\u00ae Nsight\u2122 Systems provides developers a system-wide visualization of an applications performance. Developers can optimize bottlenecks to scale efficiently across any number or size of CPUs and GPUs on Polaris. For further optimizations to compute kernels developers should use Nsight Compute.

              The NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging via a user interface and command line tool.

              In addition, the baseline feature of this tool allows users to compare results within the tool. NVIDIA Nsight Compute provides a customizable and data-driven user interface, metric collection, and can be extended with analysis scripts for post-processing results.

              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#step-by-step-guide","title":"Step-by-step guide","text":""},{"location":"polaris/performance-tools/NVIDIA-Nsight/#common-part-on-polaris","title":"Common part on Polaris","text":"

              Build your application for Polaris, and then submit your job script to Polaris or start an interactive job mode on Polaris as follows:

              $ qsub -I -l select=1 -l walltime=1:00:00 -l filesystems=home:grand -q debug -A <project-name>\n\n$ module load cudatoolkit-standalone/11.8.0 \n$ module li\n\nCurrently Loaded Modules:\n  1) craype-x86-rome          6) craype/2.7.15        11) cray-pals/1.1.7\n  2) libfabric/1.11.0.4.125   7) cray-dsmml/0.2.2     12) cray-libpals/1.1.7\n  3) craype-network-ofi       8) cray-mpich/8.1.16    13) PrgEnv-nvhpc/8.3.3\n  4) perftools-base/22.05.0   9) cray-pmi/6.1.2       14) craype-accel-nvidia80\n  5) nvhpc/21.9              10) cray-pmi-lib/6.0.17  15) cudatoolkit-standalone/11.8.0\n\n$ nsys --version\nNVIDIA Nsight Systems version 2022.4.2.1-df9881f\n\n$ ncu --version\nNVIDIA (R) Nsight Compute Command Line Profiler\nCopyright (c) 2018-2022 NVIDIA Corporation\nVersion 2022.3.0.0 (build 31729285) (public-release)\n

              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#nsight-systems","title":"Nsight Systems","text":"

              Run your application with Nsight Systems as follows:

              $ nsys profile -o {output_filename} --stats=true ./{your_application}\n

              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#nsight-compute","title":"Nsight Compute","text":"

              Run your application with Nsight Compute.

              $ ncu --set detailed -k {kernel_name} -o {output_filename} ./{your_application}\n

              Remark: Without -o option, Nsight Compute provides performance data as a standard output

              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#post-processing-the-profiled-data","title":"Post-processing the profiled data","text":""},{"location":"polaris/performance-tools/NVIDIA-Nsight/#post-processing-via-cli","title":"Post-processing via CLI","text":"
              $ nsys stats {output_filename}.qdrep\n$ ncu -i {output_filename}.ncu-rep  \n
              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#post-processing-on-your-local-system-via-gui","title":"Post-processing on your local system via GUI","text":"
              • Install NVIDIA Nsight Systems and NVIDIA Nsight Compute after downloading both of them from the NVIDIA Developer Zone. Remark: Local client version should be the same as or newer than NVIDIA Nsight tools on Polaris.
              • Download nsys output files (i.e., ending with .qdrep and . sqlite) to your local system, and then open them with NVIDIA Nsight Systems on your local system.
              • Download ncu output files (i.e., ending with .ncu-rep) to your local system, and then open them with NVIDIA Nsight Compute on your local system.
              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#more-options-for-performance-analysis-with-nsight-systems-and-nsight-compute","title":"More options for performance analysis with Nsight Systems and Nsight Compute","text":"
              $ nsys --help\n$ ncu --help\n
              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#a-quick-example","title":"A quick example","text":""},{"location":"polaris/performance-tools/NVIDIA-Nsight/#nsight-systems_1","title":"Nsight Systems","text":""},{"location":"polaris/performance-tools/NVIDIA-Nsight/#running-a-stream-benchmark-with-nsight-systems","title":"Running a stream benchmark with Nsight Systems","text":"
              jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris> nsys profile -o JKreport-nsys-BableStream --stats=true ./cuda-stream\nWarning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.\nCollecting data...\nBabelStream\nVersion: 4.0\nImplementation: CUDA\nRunning kernels 100 times\nPrecision: double\nArray size: 268.4 MB (=0.3 GB)\nTotal size: 805.3 MB (=0.8 GB)\nUsing CUDA device NVIDIA A100-SXM4-40GB\nDriver: 11040\nFunction    MBytes/sec  Min (sec)   Max         Average     \nCopy        1368294.603 0.00039     0.00044     0.00039     \nMul         1334324.779 0.00040     0.00051     0.00041     \nAdd         1358476.737 0.00059     0.00060     0.00059     \nTriad       1366095.332 0.00059     0.00059     0.00059     \nDot         1190200.569 0.00045     0.00047     0.00046     \nProcessing events...\nSaving temporary \"/var/tmp/pbs.308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov/nsys-report-f594-c524-6b4c-300a.qdstrm\" file to disk...\n\nCreating final output files...\nProcessing [===============================================================100%]\nSaved report file to \"/var/tmp/pbs.308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov/nsys-report-f594-c524-6b4c-300a.qdrep\"\nExporting 7675 events: [===================================================100%]\n\nExported successfully to\n/var/tmp/pbs.308834.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov/nsys-report-f594-c524-6b4c-300a.sqlite\n\n\nCUDA API Statistics:\n\n Time(%)  Total Time (ns)  Num Calls  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)           Name         \n -------  ---------------  ---------  ------------  ------------  ------------  ------------  ---------------------\n    41.5      197,225,738        401     491,834.8       386,695       592,751      96,647.5  cudaDeviceSynchronize\n    35.4      168,294,004          4  42,073,501.0       144,211   167,547,885  83,649,622.0  cudaMalloc           \n    22.5      106,822,589        103   1,037,112.5       446,617    20,588,840   3,380,727.4  cudaMemcpy           \n     0.4        1,823,597        501       3,639.9         3,166        24,125       1,228.9  cudaLaunchKernel     \n     0.2        1,166,186          4     291,546.5       130,595       431,599     123,479.8  cudaFree             \n\n\n\nCUDA Kernel Statistics:\n\n Time(%)  Total Time (ns)  Instances  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)                             Name                           \n -------  ---------------  ---------  ------------  ------------  ------------  -----------  ----------------------------------------------------------\n    24.5       58,415,138        100     584,151.4       582,522       585,817        543.0  void add_kernel<double>(const T1 *, const T1 *, T1 *)     \n    24.4       58,080,329        100     580,803.3       579,802       582,586        520.5  void triad_kernel<double>(T1 *, const T1 *, const T1 *)   \n    18.3       43,602,345        100     436,023.5       430,555       445,979      2,619.5  void dot_kernel<double>(const T1 *, const T1 *, T1 *, int)\n    16.5       39,402,677        100     394,026.8       392,444       395,708        611.5  void mul_kernel<double>(T1 *, const T1 *)                 \n    16.1       38,393,119        100     383,931.2       382,556       396,892      1,434.1  void copy_kernel<double>(const T1 *, T1 *)                \n     0.2          523,355          1     523,355.0       523,355       523,355          0.0  void init_kernel<double>(T1 *, T1 *, T1 *, T1, T1, T1)    \n\n\n\nCUDA Memory Operation Statistics (by time):\n\n Time(%)  Total Time (ns)  Count  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)      Operation     \n -------  ---------------  -----  ------------  ------------  ------------  -----------  ------------------\n   100.0       61,323,171    103     595,370.6         2,399    20,470,146  3,439,982.0  [CUDA memcpy DtoH]\n\n\n\nCUDA Memory Operation Statistics (by size):\n\n Total (MB)  Count  Average (MB)  Minimum (MB)  Maximum (MB)  StdDev (MB)      Operation     \n ----------  -----  ------------  ------------  ------------  -----------  ------------------\n    805.511    103         7.820         0.002       268.435       45.361  [CUDA memcpy DtoH]\n\n\n\nOperating System Runtime API Statistics:\n\n Time(%)  Total Time (ns)  Num Calls  Average (ns)  Minimum (ns)  Maximum (ns)  StdDev (ns)        Name     \n -------  ---------------  ---------  ------------  ------------  ------------  ------------  --------------\n    85.9      600,896,697         20  30,044,834.9         3,477   100,141,768  42,475,064.1  poll          \n    13.5       94,610,402      1,201      78,776.4         1,002    11,348,375     402,562.6  ioctl         \n     0.2        1,374,312         79      17,396.4         3,486       434,715      48,015.2  mmap64        \n     0.1          877,705         51      17,209.9         1,031       748,723     104,491.6  fopen         \n     0.1          741,969         12      61,830.8        17,272       256,852      64,706.5  sem_timedwait \n     0.1          529,563        120       4,413.0         1,292        20,579       2,134.3  open64        \n     0.0          251,602          4      62,900.5        57,337        72,126       6,412.6  pthread_create\n     0.0           93,461         18       5,192.3         1,011        19,386       4,401.0  mmap          \n     0.0           37,621         11       3,420.1         1,302        11,672       2,867.6  munmap        \n     0.0           35,735          9       3,970.6         1,723         6,251       1,477.2  fgetc         \n     0.0           33,533          1      33,533.0        33,533        33,533           0.0  fgets         \n     0.0           26,832         13       2,064.0         1,452         3,366         542.6  write         \n     0.0           21,341          5       4,268.2         1,213         9,738       3,378.3  putc          \n     0.0           20,838          6       3,473.0         1,763         6,853       1,801.1  open          \n     0.0           17,016         10       1,701.6         1,523         1,834          96.9  read          \n     0.0           11,430          8       1,428.8         1,082         1,583         151.9  fclose        \n     0.0            6,202          1       6,202.0         6,202         6,202           0.0  pipe2         \n     0.0            5,961          2       2,980.5         2,254         3,707       1,027.4  socket        \n     0.0            5,670          2       2,835.0         2,795         2,875          56.6  fwrite        \n     0.0            5,481          1       5,481.0         5,481         5,481           0.0  connect       \n     0.0            5,279          2       2,639.5         1,743         3,536       1,267.8  fread         \n     0.0            1,082          1       1,082.0         1,082         1,082           0.0  bind          \n\nReport file moved to \"/home/jkwack/BabelStream/build_polaris/JKreport-nsys-BableStream.qdrep\"\nReport file moved to \"/home/jkwack/BabelStream/build_polaris/JKreport-nsys-BableStream.sqlite\"\n
              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#reviewing-the-nsight-systems-data-via-gui","title":"Reviewing the Nsight Systems data via GUI","text":""},{"location":"polaris/performance-tools/NVIDIA-Nsight/#nsight-compute_1","title":"Nsight Compute","text":""},{"location":"polaris/performance-tools/NVIDIA-Nsight/#running-a-stream-benchmark-with-nsight-compute-for-triad_kernel","title":"Running a stream benchmark with Nsight Compute for triad_kernel","text":"
              jkwack@x3008c0s13b1n0:~/BabelStream/build_polaris> ncu --set detailed -k triad_kernel -o JKreport-ncu_detailed-triad_kernel-BableStream ./cuda-stream\nBabelStream\nVersion: 4.0\nImplementation: CUDA\nRunning kernels 100 times\nPrecision: double\nArray size: 268.4 MB (=0.3 GB)\nTotal size: 805.3 MB (=0.8 GB)\n==PROF== Connected to process 56600 (/home/jkwack/BabelStream/build_polaris/cuda-stream)\nUsing CUDA device NVIDIA A100-SXM4-40GB\nDriver: 11040\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\n==PROF== Profiling \"triad_kernel\": 0%....50%....100% - 18 passes\nFunction    MBytes/sec  Min (sec)   Max         Average     \nCopy        1331076.105 0.00040     0.00042     0.00041     \nMul         1304696.608 0.00041     0.00043     0.00042     \nAdd         1322600.587 0.00061     0.00062     0.00061     \nTriad       1327.700    0.60654     0.62352     0.61106     \nDot         850376.762  0.00063     0.00070     0.00065     \n==PROF== Disconnected from process 56600\n==PROF== Report: /home/jkwack/BabelStream/build_polaris/JKreport-ncu_detailed-triad_kernel-BableStream.ncu-rep\n
              "},{"location":"polaris/performance-tools/NVIDIA-Nsight/#reviewing-the-nsight-compute-data-via-gui","title":"Reviewing the Nsight Compute data via GUI","text":""},{"location":"polaris/programming-models/kokkos-polaris/","title":"Kokkos","text":""},{"location":"polaris/programming-models/kokkos-polaris/#kokkos_1","title":"Kokkos","text":"

              Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It currently can use Serial and OpenMP (threads) for CPU execution spaces (\"backends\") and CUDA, HIP, SYCL, and OpenMPTarget for GPU execution spaces. By convention, Kokkos only allows one GPU backend at a time.

              "},{"location":"polaris/programming-models/kokkos-polaris/#kokkos-documentation","title":"Kokkos Documentation","text":"
              • Kokkos-core Wiki
              • Kokkos github
              "},{"location":"polaris/programming-models/kokkos-polaris/#kokkos-on-polaris","title":"Kokkos on Polaris","text":"

              The prebuilt Kokkos on polaris includes 3 backends: Serial and OpenMP for CPU execution and CUDA for GPU execution. To use it, run

              module use /soft/modulefiles\nmodule swap PrgEnv-nvhpc PrgEnv-gnu\nmodule swap gcc/12.2.0 gcc/11.2.0\nmodule load cudatoolkit-standalone/11.8.0\nmodule load kokkos\n

              (Since the SlingShot 11 upgrade, you must use PrgEnv-gnu and the gcc and cudatoolkit version changes indicated, at least until some subsequent Polaris sytem updates have been completed.)

              This sets the following environment variables, some of which are used by cmake:

              • KOKKOS_HOME - path to the lib64/, include/ files installed
              • LIBRARY_PATH - prepends $KOKKOS_HOME/lib64 to this variable used by cmake
              • CPATH - prepends $KOKKOS_HOME/include to this variable used by cmake
              • LD_LIBRARY_PATH - prepends $KOKKOS_HOME/lib64 to this variable
              "},{"location":"polaris/programming-models/kokkos-polaris/#building-a-kokkos-application-using-cmake","title":"Building a Kokkos Application Using cmake","text":"

              Add these lines to CMakeLists.txt:

              find_package(Kokkos REQUIRED)\ntarget_link_libraries(myTarget Kokkos::kokkoscore)\n

              Here is a simple example CMakeLists.txt to compile an example program:

              cmake_minimum_required(VERSION 3.22)\nproject(buildExample)\nfind_package(Kokkos REQUIRED)\n\nset(buildExample_SOURCE_DIR \".\")\n\nset(top_SRCS\n  ${buildExample_SOURCE_DIR}/example1.cpp)\n\nset(SOURCE_FILES ${top_SRCS})\n\nadd_executable(example1_sycl_aot ${SOURCE_FILES})\ntarget_link_libraries(example1_sycl_aot Kokkos::kokkoscore)\ntarget_include_directories(example1_sycl_aot PUBLIC ${buildExample_SOURCE_DIR})\n

              Configure and build it like this:

              mkdir build\ncd build\ncmake -DCMAKE_CXX_COMPILER=CC -DCMAKE_C_COMPILER=cc ..\nmake\n
              "},{"location":"polaris/programming-models/kokkos-polaris/#building-a-kokkos-application-using-make","title":"Building a Kokkos Application Using make","text":"

              Here's an example Makefile:

              # KOKKOS_HOME set via:\n#   module load kokkos\n\n# You can look at the first lines of $KOKKOS_HOME/KokkosConfigCommon.cmake to\n# see the flags used in cmake configuration of the kokkos library build. The\n# default Kokkos module on Polaris was built with PrgEnv-nvhpc and includes\n# Serial, OpenMP (threads) and CUDA backends. So you should have that\n# environment module loaded and include compiler flags for cuda and openmp:\n\n# Cray MPI wrapper for C++ and C compilers:\nCXX=CC\nCC=cc\n\nCPPFLAGS=-cuda -fopenmp\nLDFLAGS=\n\nLDFLAGS=$(CPPFLAGS) $(LDFLAGS)\nLDLIBS=-L$(KOKKOS_HOME)/lib64 -lkokkoscore -lkokkossimd -lpthread\n\nSRCS=example1.cpp\nOBJS=$(subst .cpp,.o,$(SRCS))\n\nall: example1_polaris\n\nexample1_polaris: $(OBJS)\n        $(CXX) $(LDFLAGS) -o example1_polaris $(OBJS) $(LDLIBS)\n\nexample1.o: example1.cpp\n\nclean:\n        rm -f $(OBJS)\n\ndistclean: clean\n        rm -f example1_polaris\n
              "},{"location":"polaris/programming-models/kokkos-polaris/#configuring-your-own-kokkos-build-on-polaris","title":"Configuring Your Own Kokkos Build on Polaris","text":"

              Here are recommended environment settings and configuration to build your own kokkos libraries on Polaris:

              "},{"location":"polaris/programming-models/kokkos-polaris/#environment","title":"Environment","text":"

              To match what was done in the centrally-built kokkos associated with the modules discussed above, use the programming environment PrgEnv-gnu, and use the Cray wrapper CC as the C++ compiler. You'll also need to back up from the default gcc compiler version and make a few other module adjustments to work correctly on Polaris following the SlingShot 11 upgrade (and prior to some planned system upgrades that will make some of this environment tweaking unnecessary):

              module load cmake\nmodule swap PrgEnv-nvhpc PrgEnv-gnu\nmodule swap gcc/12.2.0 gcc/11.2.0\nmodule load cudatoolkit-standalone/11.8.0\n
              "},{"location":"polaris/programming-models/kokkos-polaris/#cmake-configuration","title":"CMake Configuration","text":"

              This example builds three backends: OpenMP, Serial, and Cuda.

              git clone git@github.com:kokkos/kokkos.git\ncd kokkos\nmkdir build\ncd build\n\ncmake\\\n -DCMAKE_BUILD_TYPE=RelWithDebInfo\\\n -DCMAKE_INSTALL_PREFIX=\"./install\"\\\n -DCMAKE_CXX_COMPILER=CC\\\n -DKokkos_ENABLE_OPENMP=ON\\\n -DKokkos_ENABLE_SERIAL=ON\\\n -DKokkos_ARCH_ZEN3=ON\\\n -DKokkos_ARCH_AMPERE80=ON\\\n -DKokkos_ENABLE_CUDA=ON\\\n -DKokkos_ENABLE_AGGRESSIVE_VECTORIZATION=ON\\\n -DKokkos_ENABLE_TESTS=OFF\\\n -DBUILD_TESTING=OFF\\\n -DKokkos_ENABLE_CUDA_LAMBDA=ON\\\n -DKokkos_ENABLE_IMPL_DESUL_ATOMICS=OFF\\\n -DCMAKE_CXX_STANDARD=17\\\n -DCMAKE_EXE_LINKER_FLAGS=-no-gcc-rpath\\\n ..\n\nmake -j16 -l16 install\n

              (The -no-gcc-rpath linker flag is to work around a bug in the post-SlingShot11 compiler environment on Polaris.)

              "},{"location":"polaris/programming-models/openmp-polaris/","title":"OpenMP","text":""},{"location":"polaris/programming-models/openmp-polaris/#overview","title":"Overview","text":"

              The OpenMP API is an open standard for parallel programming. The specification document can be found here: https://www.openmp.org. The specification describes directives, runtime routines, and environment variables that allow an application developer to express parallelism (e.g. shared memory multiprocessing and device offloading). Many compiler vendors provide implementations of the OpenMP specification (https://www.openmp.org/specifications).

              "},{"location":"polaris/programming-models/openmp-polaris/#setting-the-environment-to-use-openmp-on-polaris","title":"Setting the environment to use OpenMP on Polaris","text":"

              Many of the programming environments available on Polaris have OpenMP support.

              module OpenMP CPU support? OpenMP GPU support? PrgEnv-nvhpc yes yes llvm yes yes PrgEnv-gnu yes no PrgEnv-cray yes yes*

              *Currently PrgEnv-cray is not recommended for OpenMP offload.

              By default, the PrgEnv-nvhpc module is loaded. To switch to other modules, you can use module switch.

              "},{"location":"polaris/programming-models/openmp-polaris/#using-prgenv-nvhpc","title":"Using PrgEnv-nvhpc","text":"

              This is loaded by default, so there's no need to load additional modules. You can confirm that it is loaded by running module list to check that PrgEnv-nvhpc is in the list.

              "},{"location":"polaris/programming-models/openmp-polaris/#using-llvm","title":"Using LLVM","text":"

              To use the LLVM module, load the following.

              module load mpiwrappers/cray-mpich-llvm\nmodule load cudatoolkit-standalone\n

              See the the LLVM compiling page here for more information.

              "},{"location":"polaris/programming-models/openmp-polaris/#using-prgenv-gnu","title":"Using PrgEnv-gnu","text":"

              To switch from PrgEnv-nvhpc to PrgEnv-gnu you can run:

              module switch PrgEnv-nvhpc PrgEnv-gnu\n

              The gcc/gfortran on Polaris was not built with GPU support. To use OpenMP on the CPU, you need to unload craype-accel-nvidia80:

              module unload craype-accel-nvidia80\n
              "},{"location":"polaris/programming-models/openmp-polaris/#using-prgenv-cray","title":"Using PrgEnv-cray","text":"

              To switch from PrgEnv-nvhpc to PrgEnv-cray you can run:

              module switch PrgEnv-nvhpc PrgEnv-cray\n

              To use OpenMP on the CPU only, also unload craype-accel-nvidia80:

              module unload craype-accel-nvidia80\n

              To use OpenMP on the GPU, load cudatoolkit-standalone, although this is not recommended at the moment.

              module load cudatoolkit-standalone\n

              "},{"location":"polaris/programming-models/openmp-polaris/#building-on-polaris","title":"Building on Polaris","text":"

              The following table shows what compiler and flags to use with which PrgEnv:

              module compiler flags PrgEnv-nvhpc cc/CC/ftn (nvc/nvc++/nvfortran) -mp=gpu -gpu=cc80 llvm mpicc/mpicxx (clang/clang++) -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda PrgEnv-gnu cc/CC/ftn (gcc/g++/gfortran) -fopenmp PrgEnv-cray cc/CC/ftn -fopenmp

              For example to compile a simple code hello.cpp:

              "},{"location":"polaris/programming-models/openmp-polaris/#for-prgenv-nvhpc-after-loading-the-modules-as-discussed-above-we-would-use","title":"For PrgEnv-nvhpc, after loading the modules as discussed above we would use:","text":"
              CC -mp=gpu -gpu=cc80 hello.cpp\nftn -mp=gpu -gpu=cc80 hello.F90\n
              "},{"location":"polaris/programming-models/openmp-polaris/#for-llvm-after-loading-the-modules-as-discussed-above","title":"For LLVM, after loading the modules as discussed above:","text":"
              mpicxx -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda hello.cpp \n
              "},{"location":"polaris/programming-models/openmp-polaris/#for-prgenv-gnu-after-loading-the-modules-as-discussed-above-we-would-use","title":"For PrgEnv-gnu, after loading the modules as discussed above we would use:","text":"
              CC -fopenmp hello.cpp\nftn -fopenmp hello.F90\n
              "},{"location":"polaris/programming-models/openmp-polaris/#for-prgenv-cray-after-loading-the-modules-as-discussed-above-we-would-use","title":"For PrgEnv-cray, after loading the modules as discussed above we would use:","text":"
              CC -fopenmp hello.cpp\nftn -fopenmp hello.F90\n
              "},{"location":"polaris/programming-models/openmp-polaris/#running-on-polaris","title":"Running on Polaris","text":"

              To run, you can run the produced executable or with mpiexec in a job script, and then submit the script to the Polaris queue, like:

              $ cat submit.sh\n#!/bin/sh\n#PBS -l select=1:system=polaris\n#PBS -l walltime=0:30:00\n#PBS -q debug \n#PBS -A Catalyst\n#PBS -l filesystems=home:eagle\n\ncd ${PBS_O_WORKDIR}\n mpiexec -n 1 ./executable\n$ # submit to the queue:\n$ qsub -l select=1:system=polaris -l walltime=0:30:00 -l filesystems=home:eagle -q debug -A Catalyst ./submit.sh\n

              In the above, having the PBS options in the script and on the command line is redundant, but we put it there to show both ways of launching. This submits the script to one node in the debug queue on Polaris, requesting 30 min and the eagle and home filesystems. It will charge project Catalyst for the time.

              More details for setting up the job script are in Job Scheduling and Execution section.

              "},{"location":"polaris/programming-models/openmp-polaris/#example","title":"Example","text":"
              $ cat hello.cpp\n#include <stdio.h>\n#include <omp.h>\n\nint main( int argv, char** argc ) {\n\n  printf( \"Number of devices: %d\\n\", omp_get_num_devices() );\n\n  #pragma omp target\n  {\n    if( !omp_is_initial_device() )\n      printf( \"Hello world from accelerator.\\n\" );\n    else\n      printf( \"Hello world from host.\\n\" );\n  }\n  return 0;\n}\n\n$ cat hello.F90\nprogram  main\n  use omp_lib\n  implicit none\n  integer flag\n\n  write(*,*) \"Number of devices:\", omp_get_num_devices()\n\n  !$omp target map(from:flag)\n    if( .not. omp_is_initial_device() ) then\n      flag = 1\n    else\n      flag = 0\n   endif\n  !$omp end target\n\n   if( flag == 1 ) then\n      print *, \"Hello world from accelerator\"\n   else\n      print *, \"Hello world from host\"\n   endif\n\n end program main\n\n$ # To compile\n$ CC -mp=gpu -gpu=cc80 hello.cpp -o c_test\n$ ftn -mp=gpu -gpu=cc80 hello.F90 -o f_test\n\n$ # To run \n$ mpiexec -n 1 ./c_test\nNumber of devices: 4\nHello world from accelerator.\n$ mpiexec -n 1 ./f_test\n Number of devices:            4\n Hello world from accelerator\n
              "},{"location":"polaris/programming-models/sycl-polaris/","title":"SYCL","text":"

              SYCL (pronounced \u2018sickle\u2019) is a royalty-free, cross-platform abstraction layer that enables code for heterogeneous processors to be written using standard ISO C++ with the host and kernel code for an application contained in the same source file.

              • Specification: https://www.khronos.org/sycl/
              • Source code of the compiler: https://github.com/intel/llvm
              • ALCF Tutorial: https://github.com/argonne-lcf/sycltrain
              module load oneapi/upstream\n

              Note

              This module (compilers, libraries) gets built periodically from the latest open-source rather than releases. For more details on the release version of compiler, please find the details here. As such, these compilers will get new features and updates quickly that may break on occasion. Please submit any issues at the respective github repositories for the compilers and libraries.

              "},{"location":"polaris/programming-models/sycl-polaris/#components","title":"Components","text":"
              • These are the list of components associated with this module
              User Application Component Compilers DPC++ oneMKL Interfaces oneMKL oneDPL oneDPL SYCLomatic/DPCT dpct"},{"location":"polaris/programming-models/sycl-polaris/#dependencies","title":"Dependencies","text":"
              • SYCL programming model is supported through oneapi compilers that were built from source-code
              • Loading this module switches the default programming environment to GNU and with the following dependencies
              • PrgEnv-gnu
              • cudatoolkit-standalone
              • Environment variable is set when loading the module: ONEAPI_DEVICE_SELECTOR=ext_oneapi_cuda:gpu
              "},{"location":"polaris/programming-models/sycl-polaris/#example-memory-intilization","title":"Example (memory intilization)","text":"
              #include <sycl/sycl.hpp>\n\nint main(){\n    const int N= 100;\n    sycl::queue Q;\n    float *A = sycl::malloc_shared<float>(N, Q);\n\n    std::cout << \"Running on \"\n              << Q.get_device().get_info<sycl::info::device::name>()\n              << \"\\n\";\n\n    // Create a command_group to issue command to the group\n    Q.parallel_for(N, [=](sycl::item<1> id) { A[id] = 0.1 * id; }).wait();\n\n    for (size_t i = 0; i < N; i++)\n        std::cout << \"A[ \" << i << \" ] = \" << A[i] << std::endl;\n    return 0;\n}\n

              Compile and Run

              $ clang++ -std=c++17 -sycl-std=2020 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 main.cpp\n$ ./a.out\n

              "},{"location":"polaris/programming-models/sycl-polaris/#example-using-gpu-aware-mpi","title":"Example (using GPU-aware MPI)","text":"
              #include <stdlib.h>\n#include <stdio.h>\n#include <mpi.h>\n\n#include <sycl/sycl.hpp>\n\n// Modified from NERSC website:\n// https://docs.nersc.gov/development/programming-models/mpi\nint main(int argc, char *argv[]) {\n\n    int myrank, num_ranks;\n    double *val_device;\n    double *val_host;\n    char machine_name[MPI_MAX_PROCESSOR_NAME];\n    int name_len=0;\n\n    MPI_Init(&argc, &argv);\n    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);\n    MPI_Comm_size(MPI_COMM_WORLD, &num_ranks);\n    MPI_Get_processor_name(machine_name, &name_len);\n\n    sycl::queue q{sycl::gpu_selector_v};\n\n    std::cout << \"Rank #\" << myrank << \" runs on: \" << machine_name\n              << \", uses device: \"\n              << q.get_device().get_info<sycl::info::device::name>() << \"\\n\";\n\n    MPI_Barrier(MPI_COMM_WORLD);\n    int one=1;\n    val_host = (double *)malloc(one*sizeof(double));\n    val_device = sycl::malloc_device<double>(one,q);\n\n    const size_t size_of_double = sizeof(double);\n    *val_host = -1.0;\n    if (myrank != 0) {\n        std::cout << \"I am rank \" << myrank\n                  << \" and my initial value is: \" << *val_host << \"\\n\";\n    }\n\n    if (myrank == 0) {\n        *val_host = 42.0;\n        q.memcpy(val_device,val_host,size_of_double).wait();\n        std::cout << \"I am rank \" << myrank\n                  << \" and will broadcast value: \" << *val_host << \"\\n\";\n    }\n\n    MPI_Bcast(val_device, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);\n\n    double check = 42.0;\n    if (myrank != 0) {\n        //Device to Host\n        q.memcpy(val_host,val_device,size_of_double).wait();\n        assert(*val_host == check);\n        std::cout << \"I am rank \" << myrank\n                  << \" and received broadcast value: \" << *val_host << \"\\n\";\n    }\n\n    sycl::free(val_device,q);\n    free(val_host);\n\n    MPI_Finalize();\n\n    return 0;\n}\n

              Load Modules

              module load oneapi\nmodule load mpiwrappers/cray-mpich-oneapi\nexport MPICH_GPU_SUPPORT_ENABLED=1\n

              Compile and Run

              $ mpicxx -L/opt/cray/pe/mpich/8.1.16/gtl/lib -lmpi_gtl_cuda -std=c++17 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 main.cpp\n$ mpiexec -n 2 --ppn 2 --depth=1 --cpu-bind depth ./set_affinity_gpu_polaris.sh ./a.out\n
              For further details regarding the arguments passed to mpiexec command shown above, please visit the Job Scheduling and Execution section. A simple example describing the details and execution of the set_affinity_gpu_polaris.sh file can be found here.

              Note: By default, there is no GPU-aware MPI library linking support. The example above shows how the user can enable the linking by specifying the path to the GTL (GPU Transport Layer) library (libmpi_gtl_cuda) to the link line.

              "},{"location":"polaris/programming-models/sycl-polaris/#oneapi-math-kernel-library-onemkl-interfaces","title":"oneAPI Math Kernel Library (oneMKL) Interfaces","text":"

              oneMKL Interfaces is an open-source implementation of the oneMKL Data Parallel C++ (DPC++) interface according to the oneMKL specification. It works with multiple devices (backends) using device-specific libraries underneath.

              oneMKL is part of oneAPI. Various backend supported are shown below. More Information here.

              User Application Third-Party Library cuBLAS oneMKL interface cuSOLVER cuRAND"},{"location":"polaris/programming-models/sycl-polaris/#example-using-onemklgemm","title":"Example (using onemkl::gemm)","text":"

              The following snippet shows how to compile and run a SYCL code with oneMKL library. For instance, a GPU-based GEMM is performed using mkl::gemm API and the results are compared to a CPU-based GEMM performed using the traditional blas (e.g., AOCL-BLIS) library.

              #include <limits>\n#include <random>\n\n#include <sycl/sycl.hpp>\n\n#include <oneapi/mkl.hpp>  // ONEMKL GPU header\n#include <cblas.h>         // BLIS   CPU header\n\n// Matrix size constants\n#define SIZE 4800 // Must be a multiple of 8.\n#define M SIZE / 8\n#define N SIZE / 4\n#define P SIZE / 2\n\n//////////////////////////////////////////////////////////////////////////////////////////\n\nbool ValueSame(double a, double b) { return std::fabs(a - b) < 1.0e-08; }\nint VerifyResult(double *c_A, double *c_B) {\n  bool MismatchFound = false;\n\n  for (size_t i = 0; i < M; i++) {\n    for (size_t j = 0; j < P; j++) {\n      if (!ValueSame(c_A[i * P + j], c_B[i * P + j])) {\n        std::cout << \"fail - The result is incorrect for element: [\" << i << \", \" << j\n                  << \"], expected: \" << c_A[i * P + j] << \" , but got: \" << c_B[i * P + j]\n                  << std::endl;\n        MismatchFound = true;\n      }\n    }\n  }\n\n  if (!MismatchFound) {\n    std::cout << \"SUCCESS - The results are correct!\" << std::endl;\n    return 0;\n  } else {\n    std::cout << \"FAIL - The results mis-match!\" << std::endl;\n    return -1;\n  }\n}\n\n//////////////////////////////////////////////////////////////////////////////////////////\n\nint main() {\n  std::random_device rd;  // Will be used to obtain a seed for the random number engine\n  std::mt19937 gen(rd()); // Standard mersenne_twister_engine seeded with rd()\n  std::uniform_real_distribution<> dis(1.0, 2.0);\n\n  // C = alpha * op(A) * op(B)  + beta * C\n  oneapi::mkl::transpose transA = oneapi::mkl::transpose::nontrans;\n  oneapi::mkl::transpose transB = oneapi::mkl::transpose::nontrans;\n\n  // matrix data sizes\n  int m = M;\n  int n = P;\n  int k = N;\n\n  // leading dimensions of data\n  int ldA = k;\n  int ldB = n;\n  int ldC = n;\n\n  // set scalar fp values\n  double alpha = 1.0;\n  double beta = 0.0;\n\n  // 1D arrays on host side\n  double *A;\n  double *B;\n  double *C_host_onemkl, *C_cblas;\n\n  A = new double[M * N]{};\n  B = new double[N * P]{};\n  C_cblas = new double[M * P]{};\n  C_host_onemkl = new double[M * P]{};\n\n  // prepare matrix data with ROW-major style\n  // A(M, N)\n  for (size_t i = 0; i < M; i++)\n    for (size_t j = 0; j < N; j++)\n      A[i * N + j] = dis(gen);\n  // B(N, P)\n  for (size_t i = 0; i < N; i++)\n    for (size_t j = 0; j < P; j++)\n      B[i * P + j] = dis(gen);\n\n  std::cout << \"Problem size: c(\" << M << \",\" << P << \") = a(\" << M << \",\" << N << \") * b(\" << N\n            << \",\" << P << \")\" << std::endl;\n\n  // Resultant matrix: C_cblas\n  cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, A, ldA, B, ldB, beta,\n              C_cblas, ldC);\n\n  // Resultant matrix: C_onemkl\n  sycl::queue q(sycl::property_list{sycl::property::queue::in_order{}});\n  std::cout << \"Device: \" << q.get_device().get_info<sycl::info::device::name>() << std::endl << std::endl;\n\n  double* A_dev        = sycl::malloc_device<double>(M*N, q);\n  double* B_dev        = sycl::malloc_device<double>(N*P, q);\n  double* C_dev_onemkl = sycl::malloc_device<double>(M*P, q);\n\n  q.memcpy(A_dev, A, (M*N) * sizeof(double));\n  q.memcpy(B_dev, B, (N*P) * sizeof(double));\n\n  auto gemm_event = oneapi::mkl::blas::column_major::gemm(q, transB, transA, n, m, k, alpha, B_dev, ldB, A_dev, ldA, beta, C_dev_onemkl, ldC);\n\n  q.memcpy(C_host_onemkl, C_dev_onemkl, (M*P) * sizeof(double));\n\n  q.wait();\n  std::cout << \"Verify results between OneMKL & CBLAS: \";\n  int result_cblas = VerifyResult(C_cblas, C_host_onemkl);\n\n  delete[] A;\n  delete[] B;\n  delete[] C_cblas;\n  delete[] C_host_onemkl;\n  sycl::free(A_dev, q);\n  sycl::free(B_dev, q);\n  sycl::free(C_dev_onemkl, q);\n  return result_cblas;\n}\n

              Compile and Run

              The user would need to provide paths the math-libraris as shown below. Also please provide AOCL library for CPU GEMM by module load aocl. Environment variables MKLROOT is defined with oneapi module & AOCL_ROOT is defined with aocl module. Note: Please pay attention to the linker options for AOCL & oneMKL libraries.

              $ clang++ -std=c++17 -sycl-std=2020 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 -L$AOCL_ROOT/lib -lblis -L$MKLROOT/lib -lonemkl sycl_onemkl_gemm.cpp -o sycl_onemkl_gemm.out\n

              "},{"location":"polaris/visualization/ffmpeg/","title":"FFmpeg on Polaris","text":"

              To use FFmpeg on Polaris first load the corresponding module:

              module load ffmpeg\n

              This is a typical command line to create a movie from a series of snapshots in PNG format:

              ffmpeg -r 15 -i frames.%03d.png -r 25 -pix_fmt yuv420p movie.mp4\n

              where:

              -r 15 is the input frame rate. Experiment with values smaller than the output frame rate for longer movies.\n-r 25 is the output frame rate (use this value for standard 25 frames per second)\n-i frames.%03d.png reads the input frames in sequence\n-pix_fmt yuv420p is needed for movies to play in browsers\nmovie.mp4 is the resulting movie\n
              "},{"location":"polaris/visualization/imagemagick/","title":"ImageMagick on Polaris","text":"

              To use ImageMagick on Polaris first load the corresponding module:

              module load imagemagick\n
              "},{"location":"polaris/visualization/paraview/","title":"Paraview on Polaris","text":"

              The recommended way of running ParaView on Polaris is in client/server mode. This consists of running the ParaView client on your local resource, and the ParaView server on the Polaris compute nodes. The ParaView client needs to first be installed on your local resource, and needs to match the version that you run on Polaris.

              There are multiple versions of Paraview installed on Polaris. To find the versions of Paraview currently available on Polaris run the following command on a login node:

              module avail paraview\n

              Binary and source packages of the Paraview client for Linux, MacOS, and Windows are available from the ParaView Download Page.

              "},{"location":"polaris/visualization/paraview/#connecting-to-the-paraview-server-on-polaris","title":"Connecting to the Paraview server on Polaris","text":"

              This section describes how to launch the Paraview server on Polaris from a local ParaView client.

              "},{"location":"polaris/visualization/paraview/#start-paraview-client","title":"Start ParaView Client","text":"

              First, launch the ParaView client on your local resource. You will need to configure some server settings in the client. This initial set up should only need to be done once, and can be reused each time you want to run ParaView on Polaris.

              "},{"location":"polaris/visualization/paraview/#server-configuration","title":"Server Configuration","text":""},{"location":"polaris/visualization/paraview/#1-select-connect","title":"1. Select Connect","text":"

              From the ParaView client choose to connect to a server by either clicking on the \"Connect\" icon in the menu bar

              or selecting File->Connect from the main menu

              "},{"location":"polaris/visualization/paraview/#2-set-up-servers-first-time-only","title":"2. Set Up Servers (first time only)","text":"

              The first time you want to run a server on Polaris and have it connect to your local ParaView client, you will need to set up a Server. Once this server is set up, you can reuse it each time you run the ParaView client with the Paraview server on Polaris.

              Kitware, the developers of ParaView, maintain a database of server configurations which you can retrieve through the ParaView client. In the File->Connect menu press the button named \"Fetch Servers\" and select POLARIS@ANL. Windows users should select \"windows to POLARIS@ANL\". Press \"Import Selected\"

              "},{"location":"polaris/visualization/paraview/#3-use-paraview","title":"3. Use Paraview","text":"

              After the previous step, you can now select POLARIS@ANL in the File->Connect menu and press Connect

              At this point a new window will pop up

              There are a number of parameters that you must enter manually here:

              Xterm executable: the path of a terminal on your system. The figure shows the case of a Mac with XQuartz. You may need to change these values for Windows or Linux.

              SSH executable: the name of your ssh command. It may be different on Windows depending on the ssh client installed (i.e putty)

              Remote machine: leave this value at polaris.alcf.anl.gov

              Username: your ALCF user name

              ParaView version: the version of Paraview that you want to use. Verify first that this version is installed on the system (as described at the top of this document). You will also need to add a -mesa suffix.

              Example:

              5.11.2-mesa\n

              Client port: it is safe to use the default value

              Server port: is is safe to use the default value

              Number of nodes to reserve: enter the number of Polaris compute nodes you want to use for your job

              Number of ranks per node: enter the number of ranks per node

              Number of minutes to reserve: the duration of your job in minutes

              Account: enter here the name of your ALCF allocation

              Queue: the name of the Polaris queue you would like to use (i.e: debug for small, quick jobs, prod, preemptable)

              File Systems: enter here the file systems you need for your job, separated with colons, no spaces. Keep in mind that your job may not run if one of these file systems is not available at that time, so enter these values carefully

              Job name: safe to use default value. The PBS scheduler will assign this name to your job

              Now you can press OK to establish the connection with a Paraview server on Polaris.

              An ssh connection will be established with a Polaris login node and a password will be requested in a terminal, similar to the process you normally use to connect and work on the system.

              After you enter your password, a job will be queued and you will see a window like this:

              When the job is launched on the compute nodes, the previous window will go away and Paraview will show it is connected to Polaris in its Pipeline Browser:

              At this point you can open datasets stored on the ALCF file systems and use Paraview normally.

              "},{"location":"polaris/visualization/visit/","title":"Visit on Polaris","text":""},{"location":"polaris/visualization/visit/#getting-started","title":"Getting Started","text":"

              The latest Visit versions installed on Polaris are 3.3.3 and 3.4.0.

              Please note that at the time of this writing Visit version 3.4.0 does not yet have a client for Mac available.

              Follow these steps to install Visit on your local machine:

              • Download and install Visit for your local platform (MacOS, Windows, Linux). The version you download must match the server version installed on Polaris. Use this page
              • Download the Polaris host profile for VisIt (you may need to right-click and choose \"Save link as...\" or \"Save target as...\")
              • Copy this file to a file called ~/.visit/hosts/host_anl_polaris.xml on Mac or Linux. [ We need to also specify this path for for Windows]

              Note: Visit allows the user to download host profiles for ANL, but all these settings are outdated. We are working with the Visit developers to update the ANL host list.

              Additional information for using VisIt in client/server mode here

              "},{"location":"polaris/visualization/visit/#running-visit","title":"Running VisIt","text":"
              • Start up VisIt on your local machine
              • Click File -> Open File and choose \"ANL Polaris\" from the \"Host\" dropdown
              • You'll be prompted for your password; enter your ALCF authenticator app response
              • When you open a selected file, it will launch a job on Polaris
                • You will need to specify the \"Bank\" (Project) to use when VisIt submits jobs to the queue on Polaris. Specify a project in the Options box.
                • If your environment doesn't get sourced correctly with non-interactive SSH, you can set the default project to use under Options -> Host profiles
                • Note: Don't change the contents of the \"Machine file\" field (it should be $PBS_NODEFILE)
                • Note: The default Launch Profile is set to serial. We recommend leaving this setting in its default value, but using the parallel method to launch jobs on Polaris.
                • Note: Don't change the contents of \"launchMethod\". It must be qsub/aprun even though Polaris does not use aprun.
                • If you'd like to change other job parameters (like the number of processes, nodes, and walltime), you can do so. Please enter time in the format required by the PBS scheduler (i.e 1:00:00 for one hour)
                • If you'd like these changes to be used as your default, be sure to save them using Save Settings under the Options menu.
              "},{"location":"polaris/visualization/visit/#additional-information","title":"Additional Information","text":"
              • Visit user manual
              • Visit wiki
              "},{"location":"polaris/visualization/visualization/","title":"Visualization on Polaris","text":"

              Starting in January 2024, Polaris will serve as the primary production resource for visualization and analysis.

              Below is a list of the available visualization tools along with links to their corresponding documentation.

              ParaView: ParaView is an open-source visualization engine that seamlessly integrates with your existing tools and workflows. It allows you to construct visualization pipelines for quick data analysis. Whether interactively exploring large datasets in 3D or performing batch processing programmatically, ParaView provides versatile capabilities. For additional information, visit the Kitware website.

              VisIt: VisIt is an open-source, interactive, and scalable visualization, animation, and analysis tool. Users can rapidly generate visualizations, animate them over time, apply various operators and mathematical expressions, and save resulting images and animations for presentations. VisIt supports a diverse range of visualization features, enabling users to view data, including scalar and vector fields, on 2D and 3D structured, adaptive, and unstructured meshes. Thanks to its customizable plugin design, VisIt can visualize data from over 120 different scientific data formats. For more information, check the VisIt project GitHub page.

              FFmpeg: FFmpeg is a complete solution to record, convert, and stream audio and video. For more information, visit the FFmpeg webpage

              ImageMagick: ImageMagick is a free, open-source software suite, used for editing and manipulating digital images. It can be used to create, edit, compose, or convert bitmap images, and supports a wide range of file formats, including JPEG, PNG, GIF, TIFF, and PDF. More information in the ImageMagick webpage.

              "},{"location":"polaris/workflows/balsam/","title":"Balsam","text":"

              Balsam is a Python-based workflow manager that helps users execute large numbers of jobs, potentially with interjob dependencies, track job outcomes, and manage postprocessing analysis. A Balsam Site runs on a node with access to the job scheduler, where it can submit and monitor jobs. Overall job state is aggregated on the Balsam Server, making job data from all Sites accessible from any individual site (or the user's laptop), via the command-line interface or the Python API. To get information on how to use the command line tool, you can type balsam --help in your shell.

              Full documentation for Balsam is available online.

              Balsam requires Python 3.7+. To install Balsam on Polaris, first set up a virtual Python environment:

              module load conda\nconda activate base\npython -m venv env\nsource env/bin/activate\npip install --upgrade pip\npip install --pre balsam\n

              To use Balsam, users need an account on the Balsam server. Users can get an account by contacting the ALCF Help Desk. Once a user has an account, they can login and make a new site. A Balsam site is a project space for your workflow. You will be prompted to select what machine (Polaris) you are working on when creating a new site:

              balsam login\nbalsam site init -n new-site new-site\ncd new-site\nbalsam site start\n

              See the Balsam documentation for full details.

              "},{"location":"polaris/workflows/libensemble/","title":"libEnsemble","text":"

              libEnsemble is a Python toolkit for running dynamic ensembles of calculations.

              Users provide generator and simulator functions to express their ensembles, where the generator can steer the ensemble based on previous results. These functions can portably submit external executables at any scale.

              System details are detected, and dynamic resource management is provided. This includes automatically detecting, assigning, and reassigning GPUs for ensemble members.

              libEnsemble can be used in a consistent manner on laptops, clusters, and supercomputers with minimal required dependencies.

              "},{"location":"polaris/workflows/libensemble/#getting-libensemble-on-polaris","title":"Getting libEnsemble on Polaris","text":"

              libEnsemble is provided on Polaris in the conda module:

              module load conda\nconda activate base\n

              See the docs for more details on using python on Polaris.

              Example: creating virtual environment and updating libEnsemble E.g., to create a virtual environment that allows installation of further packages with pip:
              python -m venv /path/to-venv --system-site-packages\n. /path/to-venv/bin/activate\n
              Where ``/path/to-venv`` can be anywhere you have write access. For future uses just load the conda module and run the activate line. You can also ensure you are using the latest version of libEnsemble:
              pip install libensemble\n
              "},{"location":"polaris/workflows/libensemble/#libensemble-examples","title":"libEnsemble examples","text":"

              For a very simple example of using libEnsemble see the Simple Introduction tutorial

              For an example that runs a small ensemble using a C application (offloading work to the GPU), see the GPU app tutorial. The required files for this tutorial can be found in this directory. A video demo is also available.

              "},{"location":"polaris/workflows/libensemble/#job-submission","title":"Job Submission","text":"

              libEnsemble runs on the compute nodes on Polaris using either Python's multiprocessing or mpi4py. The user can set the number of workers for maximum concurrency. libEnsemble will detect the nodes available from the PBS environment and use these for running simulations. Polaris supports running multiple concurrent simulations on each node if desired.

              A simple example batch script for a libEnsemble use case that runs five workers on one node:

                  #!/bin/bash -l\n    #PBS -l select=1:system=polaris\n    #PBS -l walltime=00:15:00\n    #PBS -l filesystems=home:grand\n    #PBS -q debug\n    #PBS -A <myproject>\n\n    export MPICH_GPU_SUPPORT_ENABLED=1\n    cd $PBS_O_WORKDIR\n    python run_libe_forces.py --comms local --nworkers 5\n

              The script can be run with:

              qsub submit_libe.sh\n

              Or you can run an interactive session with:

              qsub -A <myproject> -l select=1 -l walltime=15:00 -lfilesystems=home:grand -qdebug -I\n
              "},{"location":"polaris/workflows/libensemble/#further-links","title":"Further links","text":"

              Docs: https://libensemble.readthedocs.io GitHub: https://github.com/Libensemble/libensemble

              "},{"location":"polaris/workflows/mig-compute/","title":"Multi-Instance GPU (MIG) mode","text":"

              MIG mode can be enabled and configured on Polaris by passing a valid configuration file to qsub:

              qsub ... -l mig_config=/home/ME/path/to/mig_config.json ...

              You can find a concise explanation of MIG concepts and terms at https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#concepts

              "},{"location":"polaris/workflows/mig-compute/#configuration","title":"Configuration","text":"

              Please study the following example of a valid configuration file:

              {\n  \"group1\": {\n    \"gpus\": [0,1],\n    \"mig_enabled\": true,\n    \"instances\": {\"7g.40gb\": [\"4c.7g.40gb\", \"3c.7g.40gb\"] }\n  },\n  \"group2\": {\n    \"gpus\": [2,3],\n    \"mig_enabled\": true,\n    \"instances\": {\"3g.20gb\": [\"2c.3g.20gb\", \"1c.3g.20gb\"], \"2g.10gb\": [\"2g.10gb\"], \"1g.5gb\": [\"1g.5gb\"], \"1g.5gb\": [\"1g.5gb\"]}\n  }\n}\n
              "},{"location":"polaris/workflows/mig-compute/#notes","title":"Notes","text":"
              • Group names are arbitrary, but must be unique
              • \"gpus\" must be an array of integers. if only one physical gpu is being configured in a group, it must still be contained within an array(ex. \"gpus\": [0],)
              • Only groups with mig_enabled set to true will be configured
              • instances denote the MIG gpu instances and the nested compute instances you wish to be configured
              • syntax is {\"gpu instance 1\": [\"cpu instance 1\", \"cpu instance 2\"], ...}
              • valid gpu instances are 1g.5gb, 1g.10gb, 2g.10gb, 3g.20gb, 4g.20gb, and 7g.40gb. the first number denotes the number of slots used out of 7 total, and the second number denotes memory in GB
              • the default cpu instance for any gpu instance has the same identifier as the gpu instance(in which case it will be the only one configurable)
              • other cpu instances can be configured with the identifier syntax Xc.Y, where X is the number of slots available in that gpu instance, and Y is the gpu instance identifier string
              • some gpu instances cannot be configured adjacently, despite there being sufficient slots/memory remaining(ex. 3g.20gb and 4g.20gb). Please see NVIDIA MIG documentation for further details
              • Currently, MIG configuration is only available in the debug, debug-scaling, and preemptable queues. submissions to other queues will result in any MIG config files passed being silently ignored
              • Files which do not match the above syntax will be silently rejected, and any invalid configurations in properly formatted files will be silently ignored. Please test any changes to your configuration in an interactive job session before use
              • A basic validator script is available at /soft/pbs/mig_conf_validate.sh. It will check for simple errors in your config, and print the expected configuration. For example:
              ascovel@polaris-login-02:~> /soft/pbs/mig_conf_validate.sh -h\nusage: mig_conf_validate.sh -c CONFIG_FILE\nascovel@polaris-login-02:~> /soft/pbs/mig_conf_validate.sh -c ./polaris-mig/mig_config.json\nexpected MIG configuration:\nGPU     GPU_INST   COMPUTE_INST\n-------------------------------\n0       7g.40gb    4c.7g.40gb\n0       7g.40gb    3c.7g.40gb\n1       7g.40gb    4c.7g.40gb\n1       7g.40gb    3c.7g.40gb\n2       2g.10gb    2g.10gb\n2       4g.20gb    2c.4g.20gb\n2       4g.20gb    2c.4g.20gb\n3       2g.10gb    2g.10gb\n3       4g.20gb    2c.4g.20gb\n3       4g.20gb    2c.4g.20gb\nascovel@polaris-login-02:~>\n
              "},{"location":"polaris/workflows/mig-compute/#example-use-of-mig-compute-instances","title":"Example use of MIG compute instances","text":"

              The following example demonstrates the use of MIG compute instances via the CUDA_VISIBLE_DEVICES environment variable:

              ascovel@polaris-login-02:~/polaris-mig> qsub -l mig_config=/home/ascovel/polaris-mig/mig_config.json -l select=1 -l walltime=60:00 -l filesystems=home:grand:swift -A Operations -q R639752 -k doe -I\nqsub: waiting for job 640002.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov to start\nqsub: job 640002.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov ready\n\nascovel@x3209c0s19b0n0:~> cat ./polaris-mig/mig_config.json\n{\n  \"group1\": {\n    \"gpus\": [0,1],\n    \"mig_enabled\": true,\n    \"instances\": {\"7g.40gb\": [\"4c.7g.40gb\", \"3c.7g.40gb\"] }\n  },\n  \"group2\": {\n    \"gpus\": [2,3],\n    \"mig_enabled\": true,\n    \"instances\": {\"4g.20gb\": [\"2c.4g.20gb\", \"2c.4g.20gb\"], \"2g.10gb\": [\"2g.10gb\"] }\n  }\n}\nascovel@x3209c0s19b0n0:~> nvidia-smi -L | grep -Po -e \"MIG[0-9a-f\\-]+\"\nMIG-63aa1884-acb8-5880-a586-173f6506966c\nMIG-b86283ae-9953-514f-81df-99be7e0553a5\nMIG-79065f64-bdbb-53ff-89e3-9d35f270b208\nMIG-6dd56a9d-e362-567e-95b1-108afbcfc674\nMIG-76459138-79df-5d00-a11f-b0a2a747bd9e\nMIG-4d5c9fb3-b0e3-50e8-a60c-233104222611\nMIG-bdfeeb2d-7a50-5e39-b3c5-767838a0b7a3\nMIG-87a2c2f3-d008-51be-b64b-6adb56deb679\nMIG-3d4cdd8c-fc36-5ce9-9676-a6e46d4a6c86\nMIG-773e8e18-f62a-5250-af1e-9343c9286ce1\nascovel@x3209c0s19b0n0:~> for mig in $( nvidia-smi -L | grep -Po -e \"MIG[0-9a-f\\-]+\" ) ; do CUDA_VISIBLE_DEVICES=${mig} ./saxpy & done 2>/dev/null\nascovel@x3209c0s19b0n0:~> nvidia-smi | tail -n 16\n+-----------------------------------------------------------------------------+\n| Processes:                                                                  |\n|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n|        ID   ID                                                   Usage      |\n|=============================================================================|\n|    0    0    0      17480      C   ./saxpy                          8413MiB |\n|    0    0    1      17481      C   ./saxpy                          8363MiB |\n|    1    0    0      17482      C   ./saxpy                          8413MiB |\n|    1    0    1      17483      C   ./saxpy                          8363MiB |\n|    2    1    0      17484      C   ./saxpy                          8313MiB |\n|    2    1    1      17485      C   ./saxpy                          8313MiB |\n|    2    5    0      17486      C   ./saxpy                          8313MiB |\n|    3    1    0      17487      C   ./saxpy                          8313MiB |\n|    3    1    1      17488      C   ./saxpy                          8313MiB |\n|    3    5    0      17489      C   ./saxpy                          8313MiB |\n+-----------------------------------------------------------------------------+\nascovel@x3209c0s19b0n0:~>\n
              "},{"location":"polaris/workflows/parsl/","title":"Parsl on Polaris","text":"

              Parsl is a flexible and scalable parallel programming library for Python.

              -- Parsl Documentation

              For many applications, managing an ensemble of jobs into a workflow is a critical step that can easily become a performance bottleneck. Many tools exist to address this, of which parsl is just one. On this page, we'll highlight some of the key pieces of information about parsl that are relevant to Polaris. Parsl is also extensively documented, has a dedicated Slack Channel, and a large community of users and developers beyond ALCF. We encourage you to engage with the parsl community for support with parsl specific questions, and for Polaris-specific questions or problems, please contact support@alcf.anl.gov.

              "},{"location":"polaris/workflows/parsl/#getting-parsl-on-polaris","title":"Getting Parsl on Polaris","text":"

              You can install parsl building off of the conda modules. You have some flexibility in how you want to extend the conda module to include parsl, but here is an example way to do it:

              # Load the Conda Module (needed everytime you use parsl)\nmodule load conda\nconda activate\n\n# Create a virtual env that uses the conda env as the system packages.\n# Only do the next line on initial set up:\npython -m venv --system-site-packages /path/to/your/virtualenv\n\n# Load the virtual env (every time):\nsource /path/to/your/virtualenv/bin/activate\n\n# Install parsl (only once)\npip install parsl\n
              "},{"location":"polaris/workflows/parsl/#using-parsl-on-polaris","title":"Using Parsl on Polaris","text":"

              Parsl has a variety of possible configuration settings. As an example, we provide the configuration below that will run one task per GPU:

              from parsl.config import Config\n\n# PBSPro is the right provider for Polaris:\nfrom parsl.providers import PBSProProvider\n# The high throughput executor is for scaling to HPC systems:\nfrom parsl.executors import HighThroughputExecutor\n# You can use the MPI launcher, but may want the Gnu Parallel launcher, see below\nfrom parsl.launchers import MpiExecLauncher, GnuParallelLauncher\n# address_by_interface is needed for the HighThroughputExecutor:\nfrom parsl.addresses import address_by_interface\n# For checkpointing:\nfrom parsl.utils import get_all_checkpoints\n\n# Adjust your user-specific options here:\nrun_dir=\"/lus/grand/projects/yourproject/yourrundir/\"\n\nuser_opts = {\n    \"worker_init\":      f\"source /path/to/your/virtualenv/bin/activate; cd {run_dir}\", # load the environment where parsl is installed\n    \"scheduler_options\":\"#PBS -l filesystems=home:eagle:grand\" , # specify any PBS options here, like filesystems\n    \"account\":          \"YOURPROJECT\",\n    \"queue\":            \"debug-scaling\",\n    \"walltime\":         \"1:00:00\",\n    \"nodes_per_block\":  3, # think of a block as one job on polaris, so to run on the main queues, set this >= 10\n    \"cpus_per_node\":    32, # Up to 64 with multithreading\n    \"available_accelerators\": 4, # Each Polaris node has 4 GPUs, setting this ensures one worker per GPU\n    \"cores_per_worker\": 8, # this will set the number of cpu hardware threads per worker.  \n}\n\ncheckpoints = get_all_checkpoints(run_dir)\nprint(\"Found the following checkpoints: \", checkpoints)\n\nconfig = Config(\n        executors=[\n            HighThroughputExecutor(\n                label=\"htex\",\n                heartbeat_period=15,\n                heartbeat_threshold=120,\n                worker_debug=True,\n                available_accelerators=user_opts[\"available_accelerators\"], # if this is set, it will override other settings for max_workers if set\n                cores_per_worker=user_opts[\"cores_per_worker\"],\n                address=address_by_interface(\"bond0\"),\n                cpu_affinity=\"block-reverse\",\n                prefetch_capacity=0,\n                start_method=\"spawn\",  # Needed to avoid interactions between MPI and os.fork\n                provider=PBSProProvider(\n                    launcher=MpiExecLauncher(bind_cmd=\"--cpu-bind\", overrides=\"--depth=64 --ppn 1\"),\n                    # Which launcher to use?  Check out the note below for some details.  Try MPI first!\n                    # launcher=GnuParallelLauncher(),\n                    account=user_opts[\"account\"],\n                    queue=user_opts[\"queue\"],\n                    select_options=\"ngpus=4\",\n                    # PBS directives (header lines): for array jobs pass '-J' option\n                    scheduler_options=user_opts[\"scheduler_options\"],\n                    # Command to be run before starting a worker, such as:\n                    worker_init=user_opts[\"worker_init\"],\n                    # number of compute nodes allocated for each block\n                    nodes_per_block=user_opts[\"nodes_per_block\"],\n                    init_blocks=1,\n                    min_blocks=0,\n                    max_blocks=1, # Can increase more to have more parallel jobs\n                    cpus_per_node=user_opts[\"cpus_per_node\"],\n                    walltime=user_opts[\"walltime\"]\n                ),\n            ),\n        ],\n        checkpoint_files = checkpoints,\n        run_dir=run_dir,\n        checkpoint_mode = 'task_exit',\n        retries=2,\n        app_cache=True,\n)\n
              "},{"location":"polaris/workflows/parsl/#special-notes-for-polaris","title":"Special notes for Polaris","text":"

              On Polaris, there is a known bug where python applications launched with mpi and that use fork to spawn processes can sometimes have unexplaned hangs. For this reason, it is recommended to use start_method=\"spawn\" on Polaris when using the MpiExecLauncher as is shown in the example config above. Alternatively, another solution is to use the GNUParallelLauncher which uses GNU Parallel to spawn processes. GNU Parallel can be loaded in your environment with the command module load gnu-parallel. Both of these approaches will circumvent the hang issue from using fork.

              "},{"location":"polaris/workflows/parsl/#updates","title":"Updates","text":"

              For parsl versions after July 2023, the address passed in the HighThroughputExecutor needs to be set to address = address_by_interface(\"bond0\"). With parsl versions prior to July 2023, it was recommended to use address = address_by_hostname() on Polaris, but with later versions this will not work on Polaris (or any other machine).

              "},{"location":"polaris/workflows/smartsim/","title":"SmartSim and SmartRedis","text":"

              SmartSim is an open source tool developed by the Hewlett Packard Enterprise (HPE) designed to facilitate the integration of traditional HPC simulation applications with machine learning workflows. There are two core components to SmartSim:

              • Infrastructure library (IL)
                • Provides API to start, stop and monitor HPC applications from Python
                • Interfaces with the scheduler launch jobs (PBSPro on Polaris and Cobalt on Theta/ThetaGPU)
                • Deploys a distributed in-memory database called the Orchestrator
              • SmartRedis client library
                • Provides clients that connect to the Orchestrator from Fortran, C, C++, Python code
                • The client API library enables data transfer to/from database and ability to load and run JIT-traced Python and ML runtimes acting on stored data

              For more resources on SmartSim, follow the links below:

              • Source code
              • Documentation
              • Zoo of examples
              • Fall 2023 ALCF User Hands-On Workshop
              • NekRS-ML
              "},{"location":"polaris/workflows/smartsim/#installation","title":"Installation","text":"

              SmartSim on Polaris can be installed creating a virtual environment based on the ML conda module

              module load conda/2023-10-04\nconda activate\nmodule load cmake\nmodule load gcc/11.2.0\nmodule load cudatoolkit-standalone/11.8.0\npython -m venv --clear /path/to/_ssim_env --system-site-packages\nsource /path/to/_ssim_env/bin/activate\npip install --upgrade pip\n
              Note that /path/to/ can either be a user's home or project directory.

              To use SmartSim in the future, simply load the same modules and source the virtual environment.

              Then set up the environment variables

              export SMARTSIM_REDISAI=1.2.7\nexport CC=cc\nexport CXX=CC\nexport CUDA_DEPS_BASE=/soft/libraries\nexport CUDA_VERSION_MAJOR=11\nexport CUDNN_VERSION_MAJOR=8\nexport CUDNN_VERSION_MINOR=6\nexport CUDNN_VERSION_EXTRA=0.163\nexport CUDNN_VERSION=$CUDNN_VERSION_MAJOR.$CUDNN_VERSION_MINOR.$CUDNN_VERSION_EXTRA\nexport CUDNN_BASE=$CUDA_DEPS_BASE/cudnn/cudnn-$CUDA_VERSION_MAJOR-linux-x64-v$CUDNN_VERSION\nexport CUDNN_LIBRARY=$CUDNN_BASE/lib/\nexport CUDNN_INCLUDE_DIR=$CUDNN_BASE/include/\nexport LD_LIBRARY_PATH=$CUDNN_LIBRARY:$LD_LIBRARY_PATH\n

              Now, install SmartSim and the GPU backend

              git clone https://github.com/CrayLabs/SmartSim.git\ncd SmartSim\npip install -e .\nexport TORCH_PATH=$( python -c 'import torch;print(torch.utils.cmake_prefix_path)' )\nexport TF_PATH=$( python -c 'import tensorflow;print(\"/\".join(tensorflow.__file__.split(\"/\")[:-1]))' )\nsmart build -v --device gpu --torch_dir $TORCH_PATH --libtensorflow_dir $TF_PATH\ncd ..\n

              Finally, install the SmartRedis library

              export LDFLAGS=-L/opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6\ngit clone https://github.com/CrayLabs/SmartRedis.git\ncd SmartRedis\npip install -e .\nmake lib\ncd ..\n

              "},{"location":"polaris/workflows/smartsim/#examples","title":"Examples","text":"

              You can find examples of in situ training and inference of ML models from an ongoing CFD simulation at the NekRs-ML repository. The smartredis and onlineGNN branches have instructions on how to build and run the examples on Polaris.

              The Fall 2023 ALCF User Hands-On Workshop repository also contains information on how to use SmartSim and NekRS-ML on Polaris, but note the instructions are specfic to the Fall of 2023.

              "},{"location":"polaris/workflows/smartsim/#notes","title":"Notes","text":"
              • SmartSim workflows, such as online training, often require launching multiple MPI applications on the same set of nodes. On Polaris, the MPICH_OFI_CXI_PID_BASE=0 must be exported before the first call to mpiexec, and then incremented by 1 and re-exported before each successive call. This is done with the SmartSim API by adding env_vars={'MPICH_OFI_CXI_PID_BASE':str(0)} to the PalsMpiexecSettings() API.
              "},{"location":"policies/alcf-acknowledgement-policy/","title":"ALCF Acknowledgement Policy","text":"

              As a U.S. Department of Energy user facility dedicated to the advancement of scientific discoveries, the Argonne Leadership Computing Facility (ALCF) provides unique computing resources and expertise to a user community that is bound by certain policies designed to acknowledge and promote the work of others as well as the resources used to accomplish this work.

              The ALCF requests your continued compliance with the terms of your program or discretionary award, specifically with regard to acknowledgments in publications and presentations based on work done with ALCF resources. Also, please forward your accepted publication citations to pubs@alcf.anl.gov.

              "},{"location":"policies/alcf-acknowledgement-policy/#ai-testbeds-publication-guidance","title":"AI Testbeds Publication Guidance","text":"

              To publish technical reports and research papers using the ALCF AI testbeds, we request you to provide us with a draft of your paper prior to submission by emailing a copy to us at support@alcf.anl.gov. We will work closely with the AI testbed vendors to provide feedback in a timely manner. We strongly recommend you engage us and the vendors early and often in this process to help us facilitate your research objectives.

              For guidance on acknowledgements, please see the following sample policies:

              "},{"location":"policies/alcf-acknowledgement-policy/#alcf-only-acknowledgement","title":"ALCF Only Acknowledgement","text":"

              Users, and ALCF staff scientists without direct project funding, should acknowledge the ALCF in all publications and presentations that speak to work performed on ALCF resources.

              This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science-Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357.

              "},{"location":"policies/alcf-acknowledgement-policy/#incitealcf-acknowledgement","title":"INCITE/ALCF Acknowledgement","text":"

              Users should acknowledge the ALCF in all publications and presentations that speak to INCITE work performed on ALCF resources.

              An award for computer time was provided by the U.S. Department of Energy\u2019s (DOE) Innovative and Novel Computational Impact on Theory and Experiment (INCITE) Program. This research used resources from the Argonne Leadership Computing Facility, a U.S. DOE Office of Science user facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-06CH11357.

              "},{"location":"policies/alcf-acknowledgement-policy/#incitealcfolcf-acknowledgement","title":"INCITE/ALCF/OLCF Acknowledgement","text":"

              Users should acknowledge the ALCF and OLCF in all publications and presentations that speak to INCITE work performed on ALCF and OLCF resources.

              An award for computer time was provided by the U.S. Department of Energy\u2019s (DOE) Innovative and Novel Computational Impact on Theory and Experiment (INCITE) Program. This research used supporting resources at the Argonne and the Oak Ridge Leadership Computing Facilities. The Argonne Leadership Computing Facility at Argonne National Laboratory is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC02-06CH11357. The Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory is supported by the Office of Science of the U.S. DOE under Contract No. DE-AC05-00OR22725.

              "},{"location":"policies/facility-policies/","title":"ALCF Facility Policies","text":"

              Be sure to familiarize yourself with the various policies and procedures for ALCF users, categorized below.

              "},{"location":"policies/facility-policies/#accounts","title":"Accounts","text":"

              All holders of user accounts must comply with ALCF and Argonne National Laboratory computing usage policies, including meeting certain security requirements and executing specific science- or engineering-related computing jobs.

              • Accounts Policy
              • Account Sponsorship and Retention Policy
              • User Authentication Policy
              "},{"location":"policies/facility-policies/#alcf-acknowledgement-policy","title":"ALCF Acknowledgement Policy","text":"

              As a U.S. Department of Energy Office of Science User Facility dedicated to the advancement of scientific discovery, the ALCF requests that its users acknowledge and promote the work of others and the resources with which this work was accomplished.

              • ALCF Acknowledgement Policy
              "},{"location":"policies/facility-policies/#data-and-allocation","title":"Data and Allocation","text":"

              These policies detail data and software usage, as well as pullback and refunds of computing hours.

              • Data Policy
              • Pullback Policy
              • Refund Policy
              • Software Policy
              "},{"location":"policies/facility-policies/#quarterly-reports","title":"Quarterly Reports","text":"

              The ALCF is required to report the progress and accomplishments of its allocation projects. Policies are detailed by award type.

              • Quarterly Report Policy
              "},{"location":"policies/facility-policies/#queue-and-scheduling-policies","title":"Queue and Scheduling Policies","text":"
              • General Policies
              "},{"location":"policies/accounts/account-sponsorship-retention-policy/","title":"Account Sponsorship & Retention Policy","text":"

              This page is designed to help you understand the different types of accounts that you will encounter at the ALCF. The policy outlined reviews the responsibilities of an account holder, an account sponsor, and those of a foreign national.

              "},{"location":"policies/accounts/account-sponsorship-retention-policy/#alcf-account-types","title":"ALCF Account Types","text":"

              Annual: This account applies to users who are not ALCF Regular Employees. The default renewal date (account deactivation date) for the account is a year from the day the account was requested. These accounts are renewed annually and must be approved by an ALCF Staff member or a Project PI (also known as the \u201capprover\u201d). Users are required to update their account information and agree to the Terms of Use each year. Users need to be a part of an active project for their account to be renewed.

              Permanent: This account applies to individuals who are Regular Employees within the ALCF and CPS Divisions. If you hold this type of account, periodic renewal is not necessary.

              Note: Foreign Nationals have a second date (apart from their account deactivation date) that controls their account access. Accounts held by foreign nationals require paperwork referred to as an ANL-593 (or just 593 for shorthand). This paperwork is also required for any on-site access, and also applies to computer accounts. DOE requirements state that the ALCF is to disable any account with expired 593 paperwork.

              A notification system has been established that issues a warning notice to users when expiration approaches and requests action to ensure that accounts are not needlessly turned off. An approval from the project PI is required to renew ANL 593 for project members that are foreign nationals.

              "},{"location":"policies/accounts/account-sponsorship-retention-policy/#your-responsibilities-as-an-account-approver","title":"Your responsibilities as an account approver","text":"

              If you approve any accounts, please take note of the following roles and responsibilities:

              By approving someone for an account at the ALCF, you are accepting responsibility for the account applicant and confirming that this individual is who they claim to be and is thus entitled to work on our computers. Do not simply \"rubber stamp\" any account application that claims you as an account approver/project PI.

              You are also responsible for approving account renewal requests. When an account is about to expire, we send a warning notification to the account holder. Among other things, the account holder is asked to contact the approver (the PI of any of the active projects the account holder is associated with) if they wish to renew their account. We cannot and will not extend someone's account without an approval. An important aspect of this process to note is that inaction will result in the account becoming deactivated on the expiration date.

              You are also responsible for approving ANL 593 renewals requests. When an account\u2019s 593 is about to expire, we send a warning notification to the account holder. Among other things, the account holder is asked to contact the approver (the PI of any of the active projects the account holder is associated with) if they wish to renew their 593. We cannot and will not extend someone's 593 without an approval. An important aspect of this process to note is that inaction will result in the account becoming deactivated at the expiration date.

              "},{"location":"policies/accounts/account-sponsorship-retention-policy/#account-retention-policy","title":"Account Retention Policy","text":"

              Accounts can exist in one of three states:

              • Active: The active state is normal for an account.
              • Inactive: The inactive state occurs when an account expires, and the ability to use ALCF resources is removed by changing the active status of the account to inactive. All files continue to exist in the user's home directory. An account will remain in the inactive state for at least 90 days before moving to the next state.
              • Deleted: After 90 days, an inactive account will be deleted. This removes all references to the account from the system (except the accounts database), including any files and home directories.

              Users with inactive or deleted accounts can request a reactivation visiting https://accounts.alcf.anl.gov and clicking on the \u201cReactivate An Account\u201d link.

              "},{"location":"policies/accounts/accounts-policy/","title":"Accounts Policy","text":"

              All holders of user accounts must abide by all appropriate Argonne Leadership Computing Facility and Argonne National Laboratory computing usage policies. The policy details are outlined in the following documents:

              • ANL's Information Technology Access Agreement
              • Addendum to ANL's Information Technology Access Agreement

              These are described at the time of the account request and include requirements such as using a sufficiently strong password, appropriate use of the system, and so on. Any user not following these requirements will have their account disabled.

              Furthermore, ALCF resources are intended to be used as a computing resource for specific computational science or engineering work, not as a general-purpose computing system.

              If someone is using the system extensively but not carrying out any computational activities, their account could be disabled.

              "},{"location":"policies/accounts/user-authentication-policy/","title":"User Authentication Policy","text":"

              Users of the ALCF systems are required to use a SafeNet token (physical or mobile) one time password, multifactor authentication system.

              This document explains the policies users must follow regarding SafeNet tokens for accessing the ALCF systems.

              "},{"location":"policies/accounts/user-authentication-policy/#multifactor-authentication","title":"MultiFactor Authentication","text":"

              \"Authentication systems are frequently described by the authentication factors that they incorporate. The three factors often considered as the cornerstone of authentication are: Something you know (for example, a password); Something you have (for example, an ID badge or a cryptographic key); and Something you are (for example, a voice print or other biometric measurement).\" -- NIST iTL Bulletin, Aug 2004

              By the NIST guidelines for identification and authentication (NIST 800-53, Revision 3, Control IA-2), ALCF aims for a Moderate level of security controls. All production systems in ALCF require multifactor authentication for users with network and local (privileged and non-privileged accounts) using the SafeNet tokens.

              "},{"location":"policies/accounts/user-authentication-policy/#mobile-and-physical-tokens","title":"Mobile and Physical Tokens","text":"

              ALCF provides every user of the production resources a physical or mobile token called a SafeNet Token. This is named after the company that developed the key fob and mobile software (the organization is now called SafeNet). \"Both tokens use AES-256 bit encryption to generate OTPs [One Time Passwords] comprised of digits, digits and letters or digits, letters and special characters...\"

              When you receive your physical token, it will be initialized, but it will have no access privileges until you have contacted us to verify your identity.

              At the end of your account or project lifecycle, please return the token to the ALCF help desk:

              ALCF Service Desk Argonne National Laboratory 9700 South Cass Avenue Building 240 Argonne, IL 60439

              "},{"location":"policies/accounts/user-authentication-policy/#protect-your-passcode-token","title":"Protect Your Passcode token","text":"

              Your passcode token should be protected by you as carefully as your credit cards or house keys. If your token is lost, stolen, or damaged, please contact us immediately so that we can deactivate the token and prevent unauthorized access. Sharing of tokens is strictly forbidden. Please do not mark on the token or alter it in any way.

              "},{"location":"policies/accounts/user-authentication-policy/#more-information","title":"More information","text":"

              [New User Guide] (http://www.alcf.anl.gov/user-guides/new-user-guide)

              Using Passcode Tokens

              "},{"location":"policies/accounts/user-authentication-policy/#references","title":"References","text":"
              • http://www.itl.nist.gov/lab/bulletns/bltnaug04.htm
              • http://csrc.nist.gov/publications/nistpubs/800-53-Rev3/sp800-53-rev3-final_updated-errata_05-01-2010.pdf
              • http://csrc.nist.gov/publications/nistpubs/800-63-1/SP-800-63-1.pdf
              • https://safenet.gemalto.com/multi-factor-authentication/authenticators/one-time-password-otp/
              "},{"location":"policies/data-and-software-policies/data-policy/","title":"Data Policy","text":""},{"location":"policies/data-and-software-policies/data-policy/#alcf-data-confidentiality","title":"ALCF Data Confidentiality","text":"

              The Argonne Leadership Computing Facility (ALCF) network is an open-research network. Because our resources and networks are open to many users and cannot be protected at a partitioned level, we cannot guarantee complete security for any data that resides here. It is up to users to provide the security they need.

              Data is not encrypted at rest. Data transferred via SSH (i.e., scp) is encrypted in transmission using SSH\u2019s mechanisms (e.g., AES256, etc.). Data transferred via Globus ( GridFTP) isn't normally fully encrypted. The GridFTP control channel is encrypted, but the data channel by default is not (though the authentication processes for both channels are encrypted). If you need full encryption of the data stream, you need to explicitly select \"encrypt transfer\" in the \"Transfer & Timer Options\" in the Globus UI or use equivalent options in the CLI or transfer API if you're using those. More information here https://docs.globus.org/faq/security

              The basic level of protection provided is UNIX file level permissions; it is the user's responsibility to ensure that file permissions and umasks are set to match their needs.

              NOTE: The default permissions and umasks are group and world readable. For help determining or setting file permissions or umasks, or creating a UNIX group, contact support@alcf.anl.gov.

              "},{"location":"policies/data-and-software-policies/data-policy/#alcf-staff-with-root-privileges","title":"ALCF Staff with Root Privileges","text":"

              ALCF resource administrators with root privileges are not constrained by the file permissions, and they have the capability to open and/or copy all files on the system. They can also assume a user\u2019s identity on the system. There is no audit trail for access, touching, or moving data; however, ALCF staff does not view or modify project data unless directed by a PI or project member to help debug a problem. Data may be touched or accessed by the filesystem itself if data needs to be repaired or verified for integrity after a filesystem event (e.g., a fsck).

              The ALCF resources are Federal resources and are the property of the United States Government. Any or all uses of this system and all files on this system may be intercepted, monitored, recorded, copied, audited, inspected, and disclosed to authorized site, Department of Energy, and law enforcement personnel, as well as authorized officials of other agencies, both domestic and foreign.

              Administrators use elevated privileges for maintenance and system management. Following are instances where ALCF staff might look at your files: - We maintain copies of all .error, .output, and Cobalt log files and may review them to determine if a job failure was due to user error or a system failure. - If you request our assistance via any mechanism (for example, support ticket, direct personal email, in-person, etc.), be aware we may need to view your files using elevated privileges to aid us in resolving your issue.

              "},{"location":"policies/data-and-software-policies/data-policy/#use-of-proprietarylicensed-software","title":"Use of Proprietary/Licensed Software","text":"

              All software used on ALCF computers must be appropriately acquired and used according to the appropriate licensing. Possession or use of illegally copied software is prohibited. Likewise, users shall not copy copyrighted software, except as permitted by the owner of the copyright. Currently, the use of export-controlled codes is prohibited.

              "},{"location":"policies/data-and-software-policies/data-policy/#prohibited-data","title":"Prohibited Data","text":"

              The ALCF computer systems are operated as research systems and contain only data related to scientific research. Use of ALCF resources to store, manipulate, or remotely access any sensitive or national security information is prohibited unless documented and approved, by the PI and ALCF leadership.

              This includes, but is not limited to, personally identifiable information (data that falls under the Privacy Act of 1974, 5 U.S.C. 552a), controlled unclassified information (CUI) to include unclassified controlled nuclear information (UCNI), naval nuclear propulsion information (NNPI), International Traffic in Arms Relations (ITAR), the design or development of nuclear, biological, or chemical weapons, or any weapons of mass destruction. The use of ALCF resources for personal or non-work-related activities is also prohibited.

              "},{"location":"policies/data-and-software-policies/data-policy/#export-control","title":"Export Control","text":"

              All principal investigators using ALCF resources and ALCF staff members working with project teams are responsible for knowing whether their project generates any of these prohibited data types or information that falls under Export Control. For questions, contact ALCF Support at support@alcf.anl.gov.

              "},{"location":"policies/data-and-software-policies/data-policy/#data-storage-systems","title":"Data Storage Systems","text":"

              Data stored for any length of time on ALCF resources should only be data directly related to work done on any of the ALCF leadership computing systems. Specific policies apply to the three types of data storage systems maintained at ALCF. Read these policies carefully and plan accordingly in terms of space, usage, and data protection.

              "},{"location":"policies/data-and-software-policies/data-policy/#home-file-system-space","title":"Home File System Space","text":"

              swift-home

              The home file system (/home) is intended to hold your executable files, configuration files, etc. It is NOT meant to hold the output from your application runs (use the data/parallel file system for that purpose). The home file system space is generally moderate in size and is the best protected. Because of its size, backups are practical to accomplish. The system performs tape backups, enabling the recovery of files more than seven days old or recovery from a catastrophic disk failure. Users should email support@alcf.anl.gov if they need assistance. The table below indicates the capabilities and characteristics of each file system.

              AI Testbed home

              /home shared across the ALCF AI testbed systems, including the ai testbed's login and compute nodes, is different from mira-home. Default user quota on the ai testbed's home is 1 TB storage and 1,000,000 files. This space is backed up.

              "},{"location":"policies/data-and-software-policies/data-policy/#team-project-or-campaign-file-system","title":"Team Project or Campaign File System","text":"

              theta-fs0 and Grand

              The team project/campaign file system is intended primarily for results output from your computational runs on the ALCF computing systems. This space is accessible to the team members of your project that have an ALCF account. Default storage quota is 1 TB. Consider this space intermediate-term storage. Once any active production and/or analysis is complete and you no longer need regular access to the data, archive it within the ALCF (explained below) or transfer it to your home institution or move it to Eagle to share it with the broader community (explained below).

              This space has redundancy in the servers and storage but is so large that replication, snapshots, and backups are not practical. Theta-fs0 and Grand are Lustre global parallel file systems. All new projects will be given storage allocations on either Grand or Eagle. Continuing projects (renewals) will have access to theta-fs0. More information on Lustre File Striping Basics: Lustre File Striping Basics

              Pullback Policy: Projects that do not use a minimum of 50% of their allocated space after 6 months will be subject to a quota limit reduction.

              AI Testbed projects file system

              The team project/campaign file system /projects mounted on AI Testbed's login and compute nodes is intended to facilitate project collaboration and is accessible to the team members of your project that have an ALCF account. /projects on the AI Testbed is different from /projects on Theta, ThetaGPU, and Cooley. Default group storage quota is 2 TB and 2,000,000 files. Please note that this space isn't backed up. Our policy is that data will be purged from disk 6 months after project completion.

              "},{"location":"policies/data-and-software-policies/data-policy/#shared-community-project-or-campaign-file-system-eagle","title":"Shared Community Project or Campaign File System (Eagle)","text":"

              The file system Eagle, a Lustre global parallel file system, has community sharing-abilities and is useful for sharing the project/campaign data with the broader research community via Globus. This space does not have redundancy in the servers or storage and is so large that replication, snapshots, and backups are not practical. The table below indicates the capabilities and characteristics of each file system. Default storage quota on Eagle is 1 TB and the default period is 1 year. More information on Lustre File Striping Basics: Lustre File Striping Basics

              Eagle Data Pullback Policy: Projects that do not use a minimum of 50% of their allocated space after 6 months will be subject to a quota limit reduction.

              Eagle Access Termination Policy: Project endpoints that have exhibited no activity* for a period of 6 months will be disabled and the storage space will be reclaimed. Notification will be sent to the PI and project members 30 days prior to and the day of the action.

              Activity is defined as, but not limited to:

              • Creation of the Globus endpoint
              • Globus transfers to and from the endpoint
              • atime audits of data files indicating access
              • Other factors may include DOIs and citations referring to the project
              "},{"location":"policies/data-and-software-policies/data-policy/#archive-space","title":"Archive Space","text":"

              The archive space is intended for offline storage of results you wish to retain but either have no immediate need to access or no room in your parallel file system space. Archiving capabilities are available via HPSS. The primary HPSS access is via HSI. HTAR is available, but its path length and file size limitations often cause it to fail. Globus Online and GridFTP are clients that can also be used with HPSS. Due to the possibility of data corruption or loss due to a bad tape, users can request dual writes for particularly critical data. Such requests will be handled on a case-by-case basis.

              "},{"location":"policies/data-and-software-policies/data-policy/#data-storage-policies","title":"Data Storage Policies","text":""},{"location":"policies/data-and-software-policies/data-policy/#disk-capacity-and-retention-policies","title":"Disk Capacity and Retention Policies","text":"---- /home /lus/theta-fs0 or /projects * /lus/grand/projects or /grand lus/eagle/projects or /eagle Default Quota 1 50 GB 1 TB / 1 million files 1 TB / 1 million files 1 TB / 1 million files Quota Enforcement 2 hard/soft hard/soft hard/soft hard/soft Disk Redundancy 3 dual parity dual parity dual parity dual parity File Server Snapshots 6 (frequency/retained) none none none none File Server Metadata Redundancy yes yes yes yes File Server Metadata Replication 4 yes yes yes yes File Server Data Replication 5 yes yes no no Data Purged from Disk n/a 6 months after project completion 8 6 months after project completion 8 After 6 months of inactivity (see Eagle Access termination policy listed in the Eagle section above) 8

              * /lus/theta-fs0 does not apply to Polaris

              "},{"location":"policies/data-and-software-policies/data-policy/#tape-capacity-and-retention-policies","title":"Tape Capacity and Retention Policies","text":"---- /home /lus/theta-fs0 or /projects * /lus/grand/projects or /grand lus/eagle/projects or /eagle Automatic Backup to Tape? 7 yes yes no no Archived to Tape Before Deleted from Disk? 9 yes yes no no
              1. While quotas are subject to negotiation on a case-by-case basis, disk space is a finite resource and projects must exercise good data management practices for their own sake and the sake of other users of the facility. With Lustre, it has become necessary to enforce file quotas as well, which are also negotiable.
              2. \u201cHard quota enforcement\u201d means a job will fail when writing output if you exceed the hard quota limit. \"Soft quota enforcement\" means you may exceed the soft quota limit (but never the higher hard quota value) for up to seven days. If you do not drop back below the soft quota limit within seven days, writes will begin to fail.
              3. Hard drives are in redundancy groups of 10 disks (8 data + 2 parity). In other words, three out of 10 drives would have to fail before data loss occurred.
              4. Metadata (i.e., information listing which blocks are part of which files) is written twice to two different storage arrays. Thus, even if an entire array were lost, the metadata would be preserved.
              5. Refers to the fact that data (user output) is written twice with each block on two different storage arrays, so that even if an entire array were lost, the data would be preserved.
              6. Snapshots are stored in your home directory (see Home File System Space for more info). If you accidentally delete the directory or need a previous version, use the cp command to copy the file back to your home directory.
              7. \u201cYes\u201d denotes that ALCF does regular backups without intervention from the user. In case of project data, data is backed up to tape after a stipulated period (see point 8 below) and is retained for 2 years (subject to change). In all other cases, user is responsible for archiving the data to HPSS or copying it to another facility as desired.
              8. The project directory is available on disk for the stipulated period but project quotas are reduced immediately following project end date (except Eagle). Access to the directory will be removed after 90 days. Requests to restore/extend access or reset the quota are reviewed on a case-by-case basis.
              9. Users who wish to retain data must archive or transfer their data elsewhere at the end of the project. Users need an active ALCF account to access archived data on HPSS. See Account Retention Policy for more information.
              "},{"location":"policies/data-and-software-policies/software-policy/","title":"ALCF Resource Software Use","text":"

              All software used on ALCF computers must be appropriately acquired and used according to the appropriate licensing. Possession or use of illegally copied software is prohibited. Likewise, users shall not copy copyrighted software, except as permitted by the owner of the copyright. Currently, the use of export-controlled codes is prohibited.

              "},{"location":"policies/data-and-software-policies/software-policy/#community-software-policy","title":"Community Software Policy","text":"

              ALCF supports the deployment of community software from active projects on production systems. A project may provide and support a code on ALCF systems for the ALCF user community as described in the [Community Software Service].

              User deployments are system-specific, and their maintenance is the sole responsibility of the project deploying it. There shall be no expectation of additional support from ALCF, other than for the provisioning of space and integration with the module system. Projects will be provided with an initial module file from a template, with the expectation that they will update and maintain the module, providing paths and instructions so that user communities can access the software.

              "},{"location":"policies/queue-scheduling/pullback-policy/","title":"Pullback Policy","text":"

              In an effort to ensure that valuable ALCF computing resources are used judiciously, a pullback policy has been instituted. Projects granted allocations under the INCITE and ALCC programs that have not used a significant amount of their allocation will be evaluated and adjusted during the year following the policies outlined on this page.

              The figures outlined below represent the maximum amount that will be pulled back from projects after specific dates during the allocation period. The decision to reduce allocations will be made on a case-by-case basis in discussion with the project's primary investigators (PIs).

              "},{"location":"policies/queue-scheduling/pullback-policy/#incite-pullback-policy","title":"INCITE Pullback Policy","text":"

              On May 1 of the current INCITE calendar year: - if usage is less than 15% remove up to 15% of the unused balance - if usage is less than 10% remove up to 30% of the unused balance

              On September 1 of the current INCITE calendar year: - if usage is less than 50% remove up to 33% of the unused balance - if usage is less than 33% remove up to 50% of the unused balance - if usage is less than 10% remove up to 75% of the unused balance

              "},{"location":"policies/queue-scheduling/pullback-policy/#alcc-pullback-policy","title":"ALCC Pullback Policy","text":"

              ALCC projects must use 50% of their allocation within the first seven months of the allocation cycle. Any unused time in excess of 50% will be deducted from the project allocation at the end of the seven month period.

              "},{"location":"policies/queue-scheduling/queue-and-scheduling-policy/","title":"Queue and Scheduling Policy","text":""},{"location":"policies/queue-scheduling/queue-and-scheduling-policy/#general-policy","title":"General Policy","text":"

              We ask that all users follow good etiquette and be excellent to one another.

              "},{"location":"policies/queue-scheduling/queue-and-scheduling-policy/#priority","title":"Priority","text":"

              As with all Argonne Leadership Computing Facility production systems, job priority in the queue is based on several criteria: - positive balance of your project - size (in nodes) of the job, larger jobs receive higher priority - the type of project (e.g. INCITE, ALCC, or discretionary) - job duration - shorter duration jobs will accumulate priority more quickly, so it is best to specify the job run time as accurately as possible

              "},{"location":"policies/queue-scheduling/queue-and-scheduling-policy/#reservations-and-scheduling-policy","title":"Reservations and Scheduling Policy","text":"

              Some work will require use of Theta that requires deviation from regular policy. On such occasions, normal reservation policy applies. Please send the regular form no fewer than five (5) business days in advance.

              "},{"location":"policies/queue-scheduling/queue-and-scheduling-policy/#big-run-mondays","title":"Big Run Mondays","text":"

              As part of our regular maintenance procedures on Mondays, we will promote to the highest priority any jobs in the queued state requesting 802 nodes or more. Promotion is subject to operational discretion.

              We may also, at our discretion, take the opportunity to promote the priority of capability jobs if the system has been drained of jobs for any other reason.

              "},{"location":"policies/queue-scheduling/queue-and-scheduling-policy/#monday-maintenance","title":"Monday Maintenance","text":"

              On Mondays where the ALCF is on a regular business schedule the system may be expected to undergo maintenance from 9:00 am until 5:00 pm US Central Time. The showres command may be used to view pending and active maintenance reservations.

              "},{"location":"policies/queue-scheduling/queue-and-scheduling-policy/#incitealcc-overburn-policy","title":"INCITE/ALCC Overburn Policy","text":"

              If an INCITE or ALCC project has exhausted its allocation in the first 11 months of its allocation year, it is eligible for overburn running. At this point, capability jobs submitted by INCITE and ALCC projects will run in the default queue (instead of backfill) for the first 11 months of the allocation year until 125% of the project allocation has been consumed.

              INCITE and ALCC projects needing additional overburn hours should e-mail support@alcf.anl.gov with a short description of what they plan to do with the additional hours, highlighting specific goals or milestones and the time expected to accomplish them. This will be reviewed by the scheduling committee, allocations committee, and ALCF management. Requests should be submitted 15 days before the start of the next quarter of the allocation year for full consideration. Non-capability jobs from projects that have exhausted their allocation will continue to run in backfill.

              To be clear, this policy does not constitute a guarantee of extra time, and we reserve the right to prioritize the scheduling of jobs submitted by projects that have not yet used 100% of their allocations, so the earlier that an INCITE or ALCC project exhausts its allocation, the more likely it is to be able to take full advantage of this policy.

              "},{"location":"policies/queue-scheduling/refund-policy/","title":"Refund Policy","text":"

              If a system problem affects your run, ALCF will consider a refund of node hours. The ALCF expects all applications to regularly checkpoint, so refunds are typically capped at four hours of runtime for the affected job, unless the problem in question prevented checkpoints.

              ALCF strongly advises against symlinking between filesystems or hard-coding paths to a different filesytem.

              To request a refund, send the following information to support@alcf.anl.gov: - Job id - Machine - Reason for refund request

              For more information, contact support@alcf.anl.gov.

              "},{"location":"running-jobs/example-job-scripts/","title":"Example Job Scripts","text":"

              This page contains a small collection of example job scripts users may find useful for submitting their jobs on Polaris. Additional information on PBS and how to submit these job scripts is available here.

              A simple example using a similar script on Polaris is available in the Getting Started Repo.

              "},{"location":"running-jobs/example-job-scripts/#cpu-mpi-openmp-examples","title":"CPU MPI-OpenMP Examples","text":"

              The following submit.sh example submits a 1-node job to Polaris with 16 MPI ranks per node and 2 OpenMP threads per rank. See Queues for details on practical limits to node counts and job times for different sizes of jobs.

              The hello_affinity program is a compiled C++ code, which is built via make -f Makefile.nvhpc in the linked directory after cloning the Getting Started repository.

              #!/bin/bash -l\n#PBS -l select=1:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -l filesystems=home:grand\n#PBS -q debug\n#PBS -A Catalyst\n\n# Change to working directory\ncd ${PBS_O_WORKDIR}\n\n# MPI and OpenMP settings\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=16\nNDEPTH=2\nNTHREADS=2\n\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./hello_affinity\n

              The following function in the hello_affinity source code is essential for uniquely identifying the CUDA device even when Multi-Instance GPU (MIG) is enabled, as each physical device will be partitioned into multiple virtual devices, each with unique UUIDs differentiated by the last few characters:

              Identifying physical or virtual GPU by UUID
              //https://stackoverflow.com/questions/68823023/set-cuda-device-by-uuid\nvoid uuid_print(cudaUUID_t a){\n  std::cout << \"GPU\";\n  std::vector<std::tuple<int, int> > r = {{0,4}, {4,6}, {6,8}, {8,10}, {10,16}};\n  for (auto t : r){\n    std::cout << \"-\";\n    for (int i = std::get<0>(t); i < std::get<1>(t); i++)\n      std::cout << std::hex << std::setfill('0') << std::setw(2) << (unsigned)(unsigned char)a.bytes[i];\n  }\n  std::cout << std::endl;\n}\n

              NOTE: If you are a zsh user, you will need to ensure ALL submission and shell scripts include the -l flag following #!/bin/bash as seen in the example above to ensure your environment is being instantiated properly. zsh is NOT supported by HPE and support from ALCF will be best effort only.

              Each Polaris compute node has 1 Milan CPU with a total of 32 physical cores, with each core supporting 2 hardware threads (for a total of 64 logical cores).

              The process affinity in this example is setup to map each MPI rank to 2 physical cores. Each MPI rank spawns 2 OpenMP threads, so 1 thread per physical core. The OpenMP settings bind each OpenMP thread to a single hardware thread within a core, such that all 32 physical cores are utilized. CPU core IDs 32 to 63 are not mapped to any MPI rank, since they correspond to simultaneous multithreaded (SMT) sibling hardware threads that share the execution resources of the core ids 0 to 31, respectively.

              • cd ${PBS_O_WORKDIR} : change into the working directory from where qsub was executed.
              • NNODES= `wc -l < $PBS_NODEFILE`: one method for determine the total number of nodes allocated to a job.
              • NRANKS_PER_NODE=16 : This is a helper variable to set the number of MPI ranks for each node to 16.
              • NDEPTH=2 : This is a helper variable to space MPI ranks 2 \"slots\" from each other. In this example, individual threads correspond to a slot. This will be used together with the --cpu-bind option from mpiexec and additional binding options are available (e.g. numa, socket, core, etc.).
              • NTHREADS=2 : This is a helper variable to set the number of OpenMP threads per MPI rank.
              • NTOTRANKS=$(( NNODES * NRANKS_PER_NODE)) : This is a helper variable calculating the total number of MPI ranks spanning all nodes in the job.

              Information on the use of mpiexec is available via man mpiexec. Some notes on the specific options used in the above example follow.

              • -n ${NTOTRANKS} : This is specifying the total number of MPI ranks to start.
              • --ppn ${NRANKS_PER_NODE} : This is specifying the number of MPI ranks to start on each node.
              • --depth=${NDEPTH} : This is specifying how many cores/threads to space MPI ranks apart on each node.
              • --cpu bind depth : This is indicating the number of cores/threads will be bound to MPI ranks based on the depth argument.
              • --env OMP_NUM_THREADS=${NTHREADS} : This is setting the environment variable OMP_NUM_THREADS : to determine the number of OpenMP threads per MPI rank.
              • --env OMP_PLACES=threads : This is indicating how OpenMP should distribute threads across the resource, in this case across hardware threads.
              "},{"location":"running-jobs/example-job-scripts/#hardware-threads","title":"Hardware threads","text":"

              This example is similar to the previous, but it exhausts all 64 logical cores available on each compute node CPU. We double the number of MPI ranks to 32, one per each physical core. Using --cpu-bind=core, the --depth flag value becomes interpreted by Cray MPICH as spacing in number of physical cores, so NDEPTH=1 ensures that rank 0 is bound to CPU core IDs (0,32), the 2 SMT sibling hardware threads that share the first physical core.

              #!/bin/bash -l\n#PBS -l select=1:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -l filesystems=home:grand\n#PBS -q debug\n#PBS -A Catalyst\n\n# Change to working directory\ncd ${PBS_O_WORKDIR}\n\n# MPI and OpenMP settings\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=32\nNDEPTH=1\nNTHREADS=2\n\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind core --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./hello_affinity\n
              Many HPC applications do not benefit from utilizing the CPU's SMT2 capabilities, and such software may achieve better performance by using the previous script such that each of the 32 physical cores only runs a single OpenMP thread.

              "},{"location":"running-jobs/example-job-scripts/#gpu-mpi-examples","title":"GPU MPI Examples","text":"

              Using the CPU job submission examples above as a baseline, there are not many additional changes needed to enable an application to make use of the 4 NVIDIA A100 GPUs on each Polaris node. In the following 2-node example (because #PBS -l select=2 indicates the number of nodes requested), 4 MPI ranks will be started on each node assigning 1 MPI rank to each GPU in a round-robin fashion. A simple example using a similar job submission script on Polaris is available in the Getting Started Repo.

              #!/bin/bash -l\n#PBS -l select=2:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -l filesystems=home:eagle\n#PBS -j oe\n#PBS -q debug\n#PBS -A Catalyst\n\n# Enable GPU-MPI (if supported by application)\nexport MPICH_GPU_SUPPORT_ENABLED=1\n\n# Change to working directory\ncd ${PBS_O_WORKDIR}\n\n# MPI and OpenMP settings\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=$(nvidia-smi -L | wc -l)\nNDEPTH=8\nNTHREADS=1\n\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\n# For applications that internally handle binding MPI/OpenMP processes to GPUs\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./hello_affinity\n\n# For applications that need mpiexec to bind MPI ranks to GPUs\n#mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./set_affinity_gpu_polaris.sh ./hello_affinity\n

              The affinity options NDEPTH=8; and --cpu-bind depth or core are set to ensure that each MPI rank is bound to a separate NUMA node. If OpenMP threading is desired, set NTHREADS=8 for each MPI rank to spawn 1 thread per physical core (all in the same NUMA domain that the rank is bound to). The OpenMP-related options are not needed if your application does not use OpenMP. Nothing additional is required on the mpiexec command for applications that internally manage GPU devices and handle the binding of MPI/OpenMP processes to GPUs. A small helper script is available for those with applications that rely on MPI to handle the binding of MPI ranks to GPUs. Some notes on this helper script and other key differences with the early CPU example follow.

              export MPICH_GPU_SUPPORT_ENABLED=1

              For applications that support GPU-enabled MPI (i.e. use MPI to communicate data directly between GPUs), this environment variable is required to enable GPU support in Cray's MPICH. Omitting this will result in a segfault. Support for this also requires that the application was linked against the the GPU Transport Layer library (e.g. -lmpi_gtl_cuda), which is automatically included for users by the craype-accel-nvidia80 module in the default environment on Polaris. If this gtl library is not properly linked, then users will see a error message indicating that upon executing the first MPI command that uses a device pointer.

              ./set_affinity_gpu_polaris.sh

              This script is useful for those applications that rely on MPI to bind MPI ranks to GPUs on each node. Such a script is not necessary when the application handles process-gpu binding. This script simply sets the environment variable CUDA_VISIBLE_DEVICES to a restricted set of GPUs (e.g. each MPI rank sees only one GPU). Otherwise, users would find that all MPI ranks on a node will target the first GPU likely having a negative impact on performance. An example for this script is available in the Getting Started repo and copied below.

              "},{"location":"running-jobs/example-job-scripts/#hardware-threads_1","title":"Hardware threads","text":"
              #!/bin/bash -l\n#PBS -l select=2:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -l filesystems=home:eagle\n#PBS -q debug\n#PBS -A Catalyst\n\n# Enable GPU-MPI (if supported by application)\nexport MPICH_GPU_SUPPORT_ENABLED=1\n\n# Change to working directory\ncd ${PBS_O_WORKDIR}\n\n# MPI and OpenMP settings\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=$(nvidia-smi -L | wc -l)\nNDEPTH=16\nNTHREADS=16\n\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\n# For applications that internally handle binding MPI/OpenMP processes to GPUs\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind numa --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./hello_affinity\n\n# For applications that need mpiexec to bind MPI ranks to GPUs\n#mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind numa --env OMP_NUM_THREADS=${NTHREADS} -env OMP_PLACES=threads ./set_affinity_gpu_polaris.sh ./hello_affinity\n

              As in the previous hardware threads example, the MPI ranks are spaced apart assuming the user wants to utilize all 64 logical cores (achieved by setting NTHREADS=$NDEPTH=16 and --cpu-bind numa here).

              In this script, we have added -j oe to the list of PBS options; -j oe combines stdout and stderr to the same file and uses the stdout filename provided (if provided). -j eo would do the same but use the stderr filename provided. Without these options, separate files containing stdout and stderr of the job are produced.

              Here we compare two bare-bones PBS submission scripts for a CUDA example with and without MPI:

              No MPIWith MPI
              #!/bin/bash\n#PBS -l select=1\n#PBS -l walltime=00:10:00\n#PBS -q debug\n#PBS -l filesystems=home\n#PBS -A <project-name>\n#PBS -o logs/\n#PBS -e logs/\n\nmodule load cudatoolkit-standalone/11.8.0\n\n$HOME/ALCFBeginnersGuide/polaris/examples/01_example_cu\n
              #!/bin/bash\n#PBS -l select=2\n#PBS -l walltime=00:10:00\n#PBS -q debug\n#PBS -l filesystems=home\n#PBS -A <project-name>\n#PBS -o logs/\n#PBS -e logs/\n\nmodule load cudatoolkit-standalone/11.8.0\n\n# Count number of nodes assigned\nNNODES=`wc -l < $PBS_NODEFILE`\n# set 1 MPI rank per GPU\nNRANKS_PER_NODE=4\n# calculate total ranks\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE}\n\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} $HOME/ALCFBeginnersGuide/polaris/examples/01_example_mpi\n
              "},{"location":"running-jobs/example-job-scripts/#setting-gpu-affinity-for-each-mpi-rank","title":"Setting GPU affinity for each MPI rank","text":"

              The CUDA_VISIBLE_DEVICES environment variable is provided for users to set which GPUs on a node are accessible to an application or MPI ranks started on a node.

              A copy of the small helper script provided in the Getting Started repo is provided below for reference:

              GPU affinity script
              #!/bin/bash -l\nnum_gpus=4\n# need to assign GPUs in reverse order due to topology\n# See Polaris Device Affinity Information https://www.alcf.anl.gov/support/user-guides/polaris/hardware-overview/machine-overview/index.html\ngpu=$((${num_gpus} - 1 - ${PMI_LOCAL_RANK} % ${num_gpus}))\nexport CUDA_VISIBLE_DEVICES=$gpu\necho \u201cRANK= ${PMI_RANK} LOCAL_RANK= ${PMI_LOCAL_RANK} gpu= ${gpu}\u201d\nexec \"$@\"\n

              Note

              The echo command prints a helpful message for the user to confirm the desired mapping is achieved. Users are encouraged to edit this file as necessary for their particular needs.

              Warning

              If planning large-scale runs with many thousands of MPI ranks, it is advised to comment out the echo command above so as not to have thousands of lines of output written to stdout.

              "},{"location":"running-jobs/example-job-scripts/#using-mps-on-the-gpus","title":"Using MPS on the GPUs","text":"

              Documentation for the NVIDIA Multi-Process Service (MPS) can be found here

              In the script below, note that if you are going to run this as a multi-node job you will need to do this on every compute node, and you will need to ensure that the paths you specify for CUDA_MPS_PIPE_DIRECTORY and CUDA_MPS_LOG_DIRECTORY do not \"collide\" and end up with all the nodes writing to the same place.

              An example is available in the Getting Started Repo and discussed below. The local SSDs or /dev/shm or incorporation of the node name into the path would all be possible ways of dealing with that issue.

              #!/bin/bash -l\nexport CUDA_MPS_PIPE_DIRECTORY=</path/writeable/by/you>\nexport CUDA_MPS_LOG_DIRECTORY=</path/writeable/by/you>\nCUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-cuda-mps-control -d\necho \"start_server -uid $( id -u )\" | nvidia-cuda-mps-control\n

              to verify the control service is running:

              $ nvidia-smi | grep -B1 -A15 Processes\n

              and the output should look similar to this:

              +-----------------------------------------------------------------------------+\n| Processes:                                                                  |\n|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n|        ID   ID                                                   Usage      |\n|=============================================================================|\n|    0   N/A  N/A     58874      C   nvidia-cuda-mps-server             27MiB |\n|    1   N/A  N/A     58874      C   nvidia-cuda-mps-server             27MiB |\n|    2   N/A  N/A     58874      C   nvidia-cuda-mps-server             27MiB |\n|    3   N/A  N/A     58874      C   nvidia-cuda-mps-server             27MiB |\n+-----------------------------------------------------------------------------+\n

              to shut down the service:

              echo \"quit\" | nvidia-cuda-mps-control

              to verify the service shut down properly:

              nvidia-smi | grep -B1 -A15 Processes

              and the output should look like this:

              +-----------------------------------------------------------------------------+\n| Processes:                                                                  |\n|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n|        ID   ID                                                   Usage      |\n|=============================================================================|\n|  No running processes found                                                 |\n+-----------------------------------------------------------------------------+\n
              "},{"location":"running-jobs/example-job-scripts/#using-mps-in-multi-node-jobs","title":"Using MPS in Multi-node Jobs","text":"

              As stated earlier, it is important to start the MPS control service on each node in a job that requires it. An example is available in the Getting Started Repo. The helper script enable_mps_polaris.sh can be used to start the MPS on a node.

              #!/bin/bash -l\n\nexport CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps\nexport CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log\nCUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-cuda-mps-control -d\necho \"start_server -uid $( id -u )\" | nvidia-cuda-mps-control\n
              The helper script disable_mps_polaris.sh can be used to disable MPS at appropriate points during a job script, if needed.

              #!/bin/bash -l\n\necho quit | nvidia-cuda-mps-control\n
              In the example job script submit.sh below, MPS is first enabled on all nodes in the job using mpiexec -n ${NNODES} --ppn 1 to launch the enablement script using a single MPI rank on each compute node. The application is then run as normally. If desired, a similar one-rank-per-node mpiexec command can be used to disable MPS on all the nodes in a job.

              #!/bin/bash -l\n#PBS -l select=1:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -q debug\n#PBS -A Catalyst\n#PBS -l filesystems=home:grand:eagle\n\ncd ${PBS_O_WORKDIR}\n\n# MPI example w/ 8 MPI ranks per node spread evenly across cores\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=8\nNDEPTH=8\nNTHREADS=1\n\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\n# Enable MPS on each node allocated to job\nmpiexec -n ${NNODES} --ppn 1 ./enable_mps_polaris.sh\n\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth ./hello_affinity\n\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --depth=${NDEPTH} --cpu-bind depth ./set_affinity_gpu_polaris.sh ./hello_affinity\n\n# Disable MPS on each node allocated to job\nmpiexec -n ${NNODES} --ppn 1 ./disable_mps_polaris.sh\n
              "},{"location":"running-jobs/example-job-scripts/#single-node-ensemble-calculations-example","title":"Single-node Ensemble Calculations Example","text":"

              In the script below, a set of four applications are launched simultaneously on a single node. Each application runs on 8 MPI ranks and targets a specific GPU using the CUDA_VISIBLE_DEVICES environment variable. In the first instance, MPI ranks 0-7 will spawn on CPUs 24-31, and GPU 0 is used. This pairing of CPUs and GPU is based on output of the nvidia-smi topo-m command showing which CPUs share a NUMA domain with each GPU. It is important to background processes using & and to wait for all runs to complete before exiting the script or continuing on with additional work. Note, multiple applications can run on the same set of CPU resources, but it may not be optimal depending on the workload. An example is available in the Getting Started Repo.

              #!/bin/bash -l\n#PBS -l select=1:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -q debug\n#PBS -A Catalyst\n#PBS -l filesystems=home:grand:eagle\n\n#cd ${PBS_O_WORKDIR}\n\n# MPI example w/ 8 MPI ranks per node spread evenly across cores\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS_PER_NODE=8\nNTHREADS=1\n\nnvidia-smi topo -m\n\nNTOTRANKS=$(( NNODES * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\nexport CUDA_VISIBLE_DEVICES=0\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --cpu-bind list:24:25:26:27:28:29:30:31 ./hello_affinity &\n\nexport CUDA_VISIBLE_DEVICES=1\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --cpu-bind list:16:17:18:19:20:21:22:23 ./hello_affinity &\n\nexport CUDA_VISIBLE_DEVICES=2\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --cpu-bind list:8:9:10:11:12:13:14:15 ./hello_affinity &\n\nexport CUDA_VISIBLE_DEVICES=3\nmpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --cpu-bind list:0:1:2:3:4:5:6:7 ./hello_affinity &\n\nwait\n
              "},{"location":"running-jobs/example-job-scripts/#multi-node-ensemble-calculations-example","title":"Multi-node Ensemble Calculations Example","text":"

              To run multiple concurrent applications on distinct sets of nodes, one simply needs to provide appropriate hostfiles to the mpiexec command. The split unix command is one convenient way to create several unique hostfiles, each containing a subset of nodes available to the job. In the 8-node example below, a total of four applications will be launched on separate sets of nodes. The $PBS_NODEFILE file will be split into several hostfiles, each containing two lines (nodes). These smaller hostfiles are then used as the argument to the --hostfile argument of mpiexec to the launch applications. It is important to background processes using & and to wait for applications to finish running before leaving the script or continuing on with additional work. Note, multiple applications can run on the same set of CPU resources, but it may not be optimal depending on the workload. An example is available in the Getting Started Repo.

              #!/bin/bash -l\n#PBS -l select=8:system=polaris\n#PBS -l place=scatter\n#PBS -l walltime=0:30:00\n#PBS -q debug-scaling\n#PBS -A Catalyst\n#PBS -l filesystems=home:grand:eagle\n\ncd ${PBS_O_WORKDIR}\n\n# MPI example w/ multiple runs per batch job\nNNODES=`wc -l < $PBS_NODEFILE`\n\n# Settings for each run: 2 nodes, 4 MPI ranks per node spread evenly across cores\n# User must ensure there are enough nodes in job to support all concurrent runs\nNUM_NODES_PER_MPI=2\nNRANKS_PER_NODE=4\nNDEPTH=8\nNTHREADS=1\n\nNTOTRANKS=$(( NUM_NODES_PER_MPI * NRANKS_PER_NODE ))\necho \"NUM_OF_NODES= ${NNODES} NUM_NODES_PER_MPI= ${NUM_NODES_PER_MPI} TOTAL_NUM_RANKS= ${NTOTRANKS} RANKS_PER_NODE= ${NRANKS_PER_NODE} THREADS_PER_RANK= ${NTHREADS}\"\n\n# Increase value of suffix-length if more than 99 jobs\nsplit --lines=${NUM_NODES_PER_MPI} --numeric-suffixes=1 --suffix-length=2 $PBS_NODEFILE local_hostfile.\n\nfor lh in local_hostfile*\ndo\n  echo \"Launching mpiexec w/ ${lh}\"\n  mpiexec -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} --hostfile ${lh} --depth=${NDEPTH} --cpu-bind depth ./hello_affinity &\n  sleep 1s\ndone\n\nwait\n
              "},{"location":"running-jobs/job-and-queue-scheduling/","title":"Running Jobs using PBS","text":""},{"location":"running-jobs/job-and-queue-scheduling/#documentation-tools","title":"Documentation / Tools","text":"
              • The PBS \"BigBook\": This is really excellent. We highly suggest you download it and search through it when you have questions. However, it is big at about 2000 pages / 40MB and contains a bunch of stuff you don't really need, so you can also download the guides separately here:
                • The PBS Users Guide: This is the users guide.
                • The PBS Reference Guide: This is the Reference Guide. It shows every option and gives you details on how to format various elements on the command line.
              • Cobalt qsub options to PBS qsub options: shows how to map cobalt command line options to PBS command line options. Can be found at the link above.
              • qsub2pbs: Installed on Theta and Cooley. Pass it a Cobalt command line and it will convert it to a PBS command line. Add the --directives option, and it will output an executable script. Note that it outputs -l select=system=None. You would need to change the None to whatever system you wanted to target (polaris, aurora, etc.).
              "},{"location":"running-jobs/job-and-queue-scheduling/#introduction","title":"Introduction","text":"

              At a high level, getting computational tasks run on an HPC system is a two-step process:

              1. You request and get allocated resources (we allocate at the node level, but some facilities you request number of cores and RAM, etc.) on one or more of the systems. This is accomplished by interacting with the job scheduler / workload manager. In the ALCF we use PBS Professional.

              2. You execute your tasks on those resources. This is accomplished in your job script by interacting with various system services (MPI, OpenMP, the HPE PALS task launch system, etc.)

              Our documentation is organized in two sections aligned with the two steps described above.

              "},{"location":"running-jobs/job-and-queue-scheduling/#table-of-contents","title":"Table of Contents","text":"
              • Obtaining and managing compute resources at ALCF - General PBS information common to all systems
                • Definitions and Notes
                • Quick Start
                • qsub - submit a job to run
                • qstat - query the status of jobs/queues
                • qalter - alter a queued job
                • qdel - delete a queued or running job
                • qmove - move a job to a different queue
                • qhold,qrls - place/release a hold on a job in a queue
                • qselect - utility to select jobids that meet criteria
                • qmsg - write a message into a jobs output file
                • qsig - send a signal to a job
                • pbsnodes - Get information about the current state of nodes
                • Using Fakeroot with Singularity
              "},{"location":"running-jobs/job-and-queue-scheduling/#obtaining-and-managing-compute-resources-at-alcf","title":"Obtaining and managing compute resources at ALCF","text":""},{"location":"running-jobs/job-and-queue-scheduling/#definitions-and-notes","title":"Definitions and Notes","text":"

              chunk: A set of resources allocated as a unit to a job. Specified inside a selection directive. All parts of a chunk come from the same host. In a typical MPI (Message-Passing Interface) job, there is one chunk per MPI process.

              vnode: A virtual node, or vnode, is an abstract object representing a host or a set of resources which form a usable part of an execution host. This could be an entire host, or a nodeboard or a blade. A single host can be made up of multiple vnodes. Each vnode can be managed and scheduled independently. Each vnode in a complex must have a unique name. Vnodes on a host can share resources, such as node-locked licenses. PBS operates on vnodes. A vnode can, and in ALCF often will, represent an entire host, but it doesn't have to. For instance, there is a mode on Polaris where we could have each physical host look like four vnodes, each with 16 threads, 1/4 of the RAM and one A100.

              ncpus: Number of resources available to execute a program. In ALCF, given the way we configure PBS, this equates to a hardware thread. For example, a single socket node with a 32 core CPU, each with two hardware threads would report that as ncpus=64.

              ngpus: The number of allocable GPUs on the vnode. For an NVIDIA A100, this could be one, however, if we enable Multi Instance GPU (MIG) mode and use cgroups it could be as high as 7.

              job: A job equates to a qsub. A set of resources allocated to you for a period of time. Your will execute one or more tasks on those resources during your job.

              task: A single execution on the resources of your job, often an mpiexec invocation launched by PALS or PMIx. You may run one task or many tasks during your job. You may run tasks sequentially or divide your resources up and run several tasks concurrently. Also sometimes referred to as job steps.

              "},{"location":"running-jobs/job-and-queue-scheduling/#quick-start","title":"Quick Start","text":"

              If you are an ALCF user and are familiar with Cobalt, you will find the PBS commands very similar though the options to qsub are quite different. Here are the \"Big Four\" commands you will use:

              1. qsub: request resources (generally compute nodes) to run your job and start your script/executable on the head node. Here is the minimal qsub allowed at the ALCF:
                • qsub -A <project> -l select=<# of nodes>,walltime=HH:MM:SS,filesystems=fs1:fs2 <your job script>
                • The -A, walltime, and filesystems are mandatory. You will receive errors if they are not specified.
                • We automatically add -k doe for you. This streams your output back rather than spooling it and copying it back at the end of the job. It probably isn't a bad idea to specify it in your script, but we enforce that option, so if you try and change it, you will get an error.
                • It is highly likely you will also want to add -l place=scatter so that each of your chunks (<# of nodes>) gets its own vnode.
                • If you want to run an executable rather than a script replace <your jobs script> in the example above with -- <your executable> (that is dash dash)
                • PBS Documentation: Users Guide, Chapter 2, page UG-11 and Reference Guide Chapter 2, section 2.57, page RG-216
              2. qstat: check on the status of your jobs or queues
                • Try these variations and see which you like best: qstat, qstat -was, qstat -was1, qstat -wan, qstat -wan1. Add -x to see jobs that have completed. We keep two weeks of history.
                • qstat -Q will list all the queues in case you forget.
                • PBS Documentation: Users Guide Sec. 10.2, page UG-175; Reference Guide Sec. 2.55, page RG-200
              3. qalter: update your request for resources
                • Just like qsub, just add a jobid at the end. Only works before the job starts;
                • If you want to change the walltime to 30 minutes: qalter -l walltime=30:00:00 <jobid>
                • PBS Documentation: Users Guide Sec. 9.2, page UG-168; Reference Guide Sec. 2.40, page RG-130
              4. qdel: cancel a job that you don't need. This will also kill a running job
                • qdel <jobid>
                • PBS Documentation: Users Guide Sec. 9.3, page UG-170; Reference Guide Sec. 2.41, page RG-143

              Note: The page numbers in the PBS guides are unique. If you search for the specified page number it will take you directly to the relevant page.

              "},{"location":"running-jobs/job-and-queue-scheduling/#qsub-submit-a-job-to-run","title":"qsub: submit a job to run","text":"

              Users Guide, Chapter 2, page UG-11 and Reference Guide Chapter 2, section 2.57, page RG-216

              At the ALCF, your qsub will likely use the following parameters:

              qsub -A <project> -k doe -l select=<#>:system=<name>, walltime=HH:MM:SS, filesystems=fs1:fs2, place=scatter <your job script>

              Where:

              • project is the project name associated with your allocation. What you check the balance of with the sbank command. This is a mandatory option at the ALCF. If you don't include it you will get qsub: Account_Name is required to be set.
              • -k doe is telling pbs to stream your output rather than buffer it on the compute nodes and then scp it at the end of the job. Note we will automatically add this if you don't specify it. We enforce this option, so if you try and specify any other output handling you will get an error.
              • # of chunks (typically nodes). Each of our systems has a PBS \"resource\" called system defined and set to the system name (polaris, sunspot, etc)
              • walltime=HH:MM:SS specifying a wall time is mandatory at the ALCF. Valid wall times depend on the queue you are using. There is a table with the queues for each machine at the end of this section and in the machine specific documentation.
              • filesystems=fs1:fs2:... Specifying which filesystems your application uses is mandatory at ALCF. The reason for this is if a filesystem goes down, we have a way of making PBS aware of that and it won't run jobs that need that filesystem. If you don't specify filesystems you will receive the following error: qsub: Resource: filesystems is required to be set.
              • place=scatter is telling PBS you want each of your chunks on a separate vnode. By default, PBS will pack your chunks to get maximum utilization. If you requested ncpus=1 and chunks=64 without place=scatter on a system with ncpus=64, all your chunks would end up on one node.
              • Your job script: See Example Job Scripts for more information about how to build your job script. For options that wont change, you do have the option of taking things off the command line and putting them in your job script. For instance the above command line could be simplified to qsub -l select=<#> <your job script> if you added the following to the top (the PBS directives have to be before any executable line) of your job script:
              #PBS -A <project>\n#PBS -k doe\n#PBS -l walltime=HH:MM:SS\n#PBS -l filesystems=fs1:fs2\n

              Also note that if you want to run an executable directly rather than a script you use two dashes and the executable name in place of your script name like this: -- /usr/bin/sleep 600

              "},{"location":"running-jobs/job-and-queue-scheduling/#more-detail","title":"More detail:","text":"

              The single biggest difference between Cobalt and PBS is the way you select resources when submitting a job. In Cobalt, every system had its own Cobalt server and you just specified the number of nodes you wanted (-n). With PBS, we are planning on running a single \"PBS Complex\" which means there will be a single PBS server for all systems in the ALCF and you need to specify enough constraints to get your job to run on the resources you want/need. One advantage of this is that getting resources from two different systems or \"co-scheduling\" is trivially possible.

              "},{"location":"running-jobs/job-and-queue-scheduling/#resource-selection-and-job-placement","title":"Resource Selection and Job Placement","text":"

              Section 2.57.2.6 RG-219 Requesting Resources and Placing jobs in the Reference Guide.

              Resources come in two flavors:

              • Job Wide: Walltime is the most common example of a job wide resource. You use the -l option to specify job wide resources, i.e. -l walltime=06:00:00. All the resources in the job have the same walltime.
              • -l <resource name>=<value>[,<resource name>=<value> ...]
              • Chunks: (see the definition above) This is how you describe what your needs are to run your job. You do this with the -l select= syntax. In the ALCF, we do whole node scheduling and every node has a resource called system which is set to the system name it belongs to (Polaris, Aurora, etc). This means you can typically get away with the very simple -l select=128:system=foo which will give you 128 complete nodes on system foo.
              • -l select=[<N>:]<chunk>[+[<N>:]<chunk> ...] where N specifies how many of that chunk and a chunk is of the form:
              • <resource name>=<value>[:<resource name>=<value> ...]
              • Here is a hypothetical example that would select resources with A100s and other resources with A40 GPUs. PBS takes care of co-scheduling the nodes on the two systems for you transparently. Note that in this case since we did not specify system= if there were multiple systems that could satisfy a chunk you wouldn't know ahead of time which system you would get.

              -l select=128:ncpus=64:ngpus=4:gputype=A100+32:ncpus=64:ngpus=2:gputype=A40

              You also have to tell PBS how you want the chunks distributed across the physical hardware. You do that via the -l place option:

              • -l place=[<arrangement>][: <sharing> ][: <grouping>] where
              • arrangement is one of free | pack | scatter | vscatter
                • unless you have a specific reason to do otherwise, you probably want to set this to scatter, otherwise you may not get what you expect. For instance on a host with ncpus=64, if you requested -l select=8:ncpus=8 you could end up with all of our chunks on one node.
                • free means PBS can distribute them as it sees fit
                • pack means all chunks from one host. Note that this is not the minimum number of hosts, it is one host. If the chunks can't fit on one host, the qsub will fail.
                • scatter means take only one chunk from any given host.
                • vscatter means take only one chunk from any given vnode. If a host has multiple vnodes, you could end up with more than one chunk on the host.
              • sharing is one of excl | shared | exclhost where
                • NOTE: Node configuration can override your requested sharing mode. For instance, in most cases ALCF sets the nodes to force_exclhost, so normally you don't have to specify this.
                • excl means this job gets the entire vnode
                • shared means the vnode could be shared with another job from another user.
                • exclhost means this job gets the entire host, even if it has multiple vnodes.
              • group=<resource name>
                • As an example, for machines that use a dragonfly network topology, we provide a PBS resource named tier1 indicating which dragonfly group a node is in. If you wanted to ensure that all the chunks came from a single dragonfly group, you could specify place=group=tier1 as part of your qsub. tier0 is rack granularity, so group=tier0 would ensure your nodes all came from one rack. Note that if you requested more nodes than were available in a rack your job would never run and you would see something like Not Running: Insufficient amount of resource: tier0.

              We have defined placement sets for the tier0 and tier1 resources. As a result, if you don't specify a grouping PBS will preferentially group your nodes in a placement set, but it won't drain or delay your job start to do so. For example, if you request 10 nodes and don't specify a grouping, if 10 nodes are available in the same rack, all your nodes will be in one rack. If not, but there are 10 nodes in a single dragonfly group, all your nodes will be in one dragonfly group. If you wish to specify a specific rack or dragonfly group, that is accomplished via the select syntax. For instance, qsub ... -l select=10:tier1=g0 would force your 10 nodes to be in dragonfly group 0.

              Here is a heavily commented sample PBS submission script that shows some more of the options, but remember that the PBS manuals referenced at the top of this page are the ultimate resource.

              #!/bin/bash -l\n# UG Section 2.5, page UG-24 Job Submission Options\n# Add another # at the beginning of the line to comment out a line\n# NOTE: adding a switch to the command line will override values in this file.\n\n# These options are MANDATORY at ALCF; Your qsub will fail if you don't provide them.\n#PBS -A <short project name>\n#PBS -l walltime=HH:MM:SS\n#file systems used by the job\n#PBS -l filesystems=home:eagle\n\n\n# Highly recommended\n# The first 15 characters of the job name are displayed in the qstat output:\n#PBS -N <name>\n\n# If you need a queue other than the default, which is prod (uncomment to use)\n##PBS -q <queue name>\n\n# Controlling the output of your application\n# UG Sec 3.3 page UG-42 Managing Output and Error Files\n# By default, PBS spools your output on the compute node and then uses scp to move it the\n# destination directory after the job finishes.  Since we have globally mounted file systems\n# it is highly recommended that you use the -k option to write directly to the destination\n# the doe stands for direct, output, error\n#PBS -k doe\n#PBS -o <path for stdout>\n#PBS -e <path for stderr>\n\n# If you want to merge stdout and stderr, use the -j option\n# oe=merge stdout/stderr to stdout, eo=merge stderr/stdout to stderr, n=don't merge\n#PBS -j n\n\n# Controlling email notifications\n# UG Sec 2.5.1, page UG-25 Specifying Email Notification\n# When to send email b=job begin, e=job end, a=job abort, j=subjobs (job arrays), n=no mail\n#PBS -m be\n# Be default, mail goes to the submitter, use this option to add others (uncomment to use)\n#PBS -M <email addresses>\n\n# Setting job dependencies\n# UG Section 6.2, page UG-109 Using Job Dependencies\n# There are many options for how to set up dependencies;  afterok will give behavior similar\n# to Cobalt (uncomment to use)\n##PBS depend=afterok:<jobid>:<jobid>\n\n# Environment variables (uncomment to use)\n# UG Section 6.12, page UG-126 Using Environment Variables\n# RG Sect 2.57.7, page RG-233 Environment variables PBS puts in the job environment\n##PBS -v <variable list>\n## -v a=10, \"var2='A,B'\", c=20, HOME=/home/zzz\n##PBS -V exports all the environment variables in your environment to the compute node\n\n\n# The rest is an example of how an MPI job might be set up\necho Working directory is $PBS_O_WORKDIR\ncd $PBS_O_WORKDIR\n\necho Jobid: $PBS_JOBID\necho Running on host `hostname`\necho Running on nodes `cat $PBS_NODEFILE`\n\nNNODES=`wc -l < $PBS_NODEFILE`\nNRANKS=1           # Number of MPI ranks per node\nNDEPTH=1           # Number of hardware threads per rank, spacing between MPI ranks on a node\nNTHREADS=1         # Number of OMP threads per rank, given to OMP_NUM_THREADS\n\nNTOTRANKS=$(( NNODES * NRANKS ))\n\necho \"NUM_OF_NODES=${NNODES}  TOTAL_NUM_RANKS=${NTOTRANKS}  RANKS_PER_NODE=${NRANKS}  THREADS_PER_RANK=${NTHREADS}\"\n\nmpiexec --np ${NTOTRANKS} -ppn ${NRANKS} -d ${NDEPTH} -env OMP_NUM_THREADS=${NTHREADS} ./hello_mpi\n
              "},{"location":"running-jobs/job-and-queue-scheduling/#email-notifications","title":"Email Notifications","text":"

              Users should add -M <email address> if they want notifications as a best practice.

              Note: For users with '@alcf.anl.gov' email addressed, PBS will send out an email once the job has ended by default. If you do not want to receive these notifications, you will need to add #PBS -m n to your script."},{"location":"running-jobs/job-and-queue-scheduling/#specifying-filesystems","title":"Specifying Filesystems","text":"

              Note: The filesystems attribute is mandatory. If you do not specify a filesystem(s) you will receive the following error message upon submission:

              qsub: Resource: filesystems is required to be set.

              Valid filesystems are home, eagle, and grand. For example, to request the home and eagle filesystems for your job you would add -l filesystems=home:eagle to your qsub command.

              If a job is submitted while a filesystem it requested is marked down, the job will be queued but will not run, with a message in the comment field of the job as to why it is not running. Run qstat -f <jobid> to see the comment field. For example, if the job requested for eagle and if Eagle is unavailable, the comment field will have Can Never Run: Insufficient amount of server resource: eagle_fs (True != False)). Once the affected filesystem has been returned to normal operation, and the filesystem is marked as being available, the job will then be scheduled normally. The job cannot run until all filesystems requested by the job are available.

              If a job requesting a filesystem that is marked down is already in the queue, the job will be not run until all of its requested filesystems are available.

              An example of a job requesting filesystems:

              qsub -l select=10:ncpus=64,walltime=30:00,filesystems=grand:home -A ProjectX -q prod my_job.sh

              To update the filesystems list for your job, use qalter.

              "},{"location":"running-jobs/job-and-queue-scheduling/#qsub-examples","title":"qsub examples","text":"
              • qsub -A my_allocation -l select=4:system=polaris -l filesystems=home:eagle -l walltime=30:00 -q debug-scaling -- a.out
                • run a.out on 4 chunks on polaris with a walltime of 30 minutes in debug-scaling queue; charge my_allocation;
                • Since we allocate full nodes on Polaris, 4 chunks will be 4 nodes. If we shared nodes, that would be 4 threads.
                • use the -- (dash dash) syntax when directly running an executable.
              • qsub -A my_allocation -l place=scatter -l filesystems=home:eagle -l select=32:ncpus=32 -q prod -l walltime=30:00 mpi_mm_64.sh
                • 32 chunks on any system that meets the requirements. Each chunk must have 32 HW threads; place=scatter means use a different vnode for each chunk, even if you could fit more than one on a vnode. Use the queue named prod.
              "},{"location":"running-jobs/job-and-queue-scheduling/#qstat-query-the-status-of-jobsqueues","title":"qstat: Query the status of jobs/queues","text":"

              Users Guide Sec. 10.2, page UG-175; Reference Guide Sec. 2.55, page RG-200

              "},{"location":"running-jobs/job-and-queue-scheduling/#jobs","title":"Jobs","text":"

              At it's most basic, you just type qstat and it will list all the jobs currently running, queued, or held on the system. If you are interested in a specific job or jobs, you can provide a space separated list on the command line: qstat job1 job2....

              Job id            Name             User              Time Use S Queue\n----------------  ---------------- ----------------  -------- - -----\n349726.polaris-p* PDE2             user1                    0 Q prod\n336987.polaris-p* inf_clDB         user2                    0 H large\n353205.polaris-p* 3d-2.sub         user3             2044:14* R large\n

              One of the annoying things about qstat is that the output fields are fixed with and it will truncate the output. This is indicated by an asterisk as the last character. You can add -w for wide. It doesn't prevent truncation, but makes it less likely. A useful variant is qstat -was1. It shows the number of nodes, tasks, the requested walltime, and the comment, all on one line. qstat -wan will give you the node list you ran on, just remember that can be long. If you want an estimate of when the job will start, add the -T option. Note that start time is not available for all jobs, just the next N jobs that are expected to run. If you want to know everything there is to know about the job, add the -f flag.

                                                                          Req'd  Req'd   Elap\nJob ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time\n--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----\n353201.polaris* user1    large    3d-1.sub    34449  60 38*    --  24:00 R 08:25    Job run at Tue Nov 15 at 16:44 on (x3006c0s13b1n0:ngpus=4:ncpus=64)+(x...\n353289.polaris* user2    medium   run_mae_l*    --   32 20*    --  12:00 Q   --     Not Running: Job would conflict with reservation or top job\n353411.polaris* user3    large    1310W60       --   64  64    --  06:00 Q   --     Not Running: Not enough free nodes available\n336990.polaris* user4    large    inf_clDB      --  464 29*    --  01:00 H   --     Job held by user4 on Mon Oct  3 20:16:26 2022\n
              The comment field is your friend. Wondering why your job isn't running? Check the comment. Wondering about the fate of a finished job? Add the -x option to see finished jobs (our history retention is currently set at two weeks) and check the comment. This cannot be stressed enough. Often, when a user ticket comes in about PBS, we answer it by looking at the comment.

              If you are familiar with jq or some other command line JSON tool, the -F JSON option can be quite handy. grep is great, but when you grep the -f output for something, you probably want to know which node the found lines belong to. With the JSON output that is trivial.

              allcock@polaris-login-02:~/.ssh>  qstat -fF JSON | jq '.Jobs | map_values(select(.job_state == \"R\") | {Job_Name, Account_Name, qtime, stime})'\n{\n  \"349710.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov\": {\n    \"Job_Name\": \"P38\",\n    \"Account_Name\": \"CompBioAffin\",\n    \"qtime\": \"Fri Nov  4 11:04:12 2022\",\n    \"stime\": \"Fri Nov 11 07:52:12 2022\"\n  },\n  \"352220.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov\": {\n    \"Job_Name\": \"mdsim_10000_run1.pbs\",\n    \"Account_Name\": \"RL-fold\",\n    \"qtime\": \"Thu Nov 10 22:41:55 2022\",\n    \"stime\": \"Fri Nov 11 09:00:12 2022\"\n  },\n
              "},{"location":"running-jobs/job-and-queue-scheduling/#queues","title":"Queues","text":"

              qstat -Q Will show you the names of all the queues and tell you their status. If they are enabled (Ena column), you can queue jobs into them. If they are started (Str column) then the scheduler will try and run jobs from it. There is a -f (full) option but that is mostly for admins, though you can find the min and max node count (resources_[min|max].nodect) and min and max walltime (resources_[min|max]walltime) from the output. Those values are also available in this documentation.

              "},{"location":"running-jobs/job-and-queue-scheduling/#qalter-alter-a-queued-job","title":"qalter: Alter a queued job","text":"

              Users Guide Sec. 9.2, page UG-168; Reference Guide Sec. 2.40, page RG-130

              Basically takes the same options as qsub; Say you typoed and set the walltime to 300 minutes instead of 30 minutes. You could fix it (if the job had not started running) by doing qalter -l walltime=30:00 <jobid> [<jobid> <jobid>...] The new value overwrites any previous value.

              "},{"location":"running-jobs/job-and-queue-scheduling/#qdel-delete-a-queued-or-running-job","title":"qdel: Delete a queued or running job:","text":"

              Users Guide Sec. 9.3, page UG-170; Reference Guide Sec. 2.41, page RG-143

              qdel <jobid> [<jobid> <jobid>...]

              "},{"location":"running-jobs/job-and-queue-scheduling/#qmove-move-a-job-to-a-different-queue","title":"qmove: Move a job to a different queue","text":"

              Users Guide Sec. 9.7, page UG-173; Reference Guide Sec. 2.46, page RG-175

              • qmove <new queue> <jobid> [<jobid> <jobid>...]
              • Only works before a job starts running
              "},{"location":"running-jobs/job-and-queue-scheduling/#qholdqrls-place-release-a-user-hold-on-a-job","title":"qhold,qrls: Place / release a user hold on a job","text":"

              Reference Guide Sec 2.44, page RG-150 and Sec 2.50, page RG-183

              • [qhold | qrls] <jobid> [<jobid> <jobid>...]
              "},{"location":"running-jobs/job-and-queue-scheduling/#qselect-query-jobids-for-use-in-commands","title":"qselect: Query jobids for use in commands","text":"

              Users Guide Sec. 10.1, page UG-175; Reference Guide Sec. 2.52, page RG-189

              • qdel `qselect -N test1` will delete all the jobs that had the job name set to test1.
              "},{"location":"running-jobs/job-and-queue-scheduling/#qmsg-write-a-message-into-a-jobs-output-file","title":"qmsg Write a message into a jobs output file","text":"

              Users Guide Sec. 9.4, page UG-171; Reference Guide Sec. 2.47, page RG-177

              • qmsg -E -O \"This is the message\" <jobid> [<jobid> <jobid>...]
              • -E writes it to standard error, -O writes it to standard out
              "},{"location":"running-jobs/job-and-queue-scheduling/#qsig-send-a-signal-to-a-job","title":"qsig Send a signal to a job","text":"

              Users Guide Sec. 9.5, page UG-172; Reference Guide Sec. 2.53, page RG-195

              • qsig -s <signal> <jobid> [<jobid> <jobid>...]
              • If you don't specify a signal, SIGTERM is sent.
              "},{"location":"running-jobs/job-and-queue-scheduling/#pbsnodes-get-information-about-the-current-state-of-nodes","title":"pbsnodes Get information about the current state of nodes","text":"

              Reference Guide Sec 2.7 page RG-36

              This is more for admins, but it can tell you what nodes are free (state), how many \"CPUs\" which is actually the number of threads (ncpus), how many GPUs (ngpus) which with some GPUs like NVIDIA A100s can change depending on the MIG mode, and if the node is shared or not (sharing).

              pbsnodes <node name>: Everything there is to know about a node

              > pbsnodes x3002c0s7b1n0\nx3002c0s7b1n0\n     Mom = x3002c0s7b1n0.hsn.cm.polaris.alcf.anl.gov\n     Port = 15002\n     pbs_version = 2022.1.1.20220926110806\n     ntype = PBS\n     state = free\n     pcpus = 64\n     resources_available.arch = linux\n     resources_available.demand = False\n     resources_available.gputype = A100\n     resources_available.host = x3002c0s7b1n0\n     resources_available.mem = 527672492kb\n     resources_available.ncpus = 64\n     resources_available.ngpus = 4\n     resources_available.system = polaris\n     resources_available.tier0 = x3002-g0\n     resources_available.tier1 = g0\n     resources_available.vnode = x3002c0s7b1n0\n     resources_assigned.accelerator_memory = 0kb\n     resources_assigned.hbmem = 0kb\n     resources_assigned.mem = 0kb\n     resources_assigned.naccelerators = 0\n     resources_assigned.ncpus = 0\n     resources_assigned.ngpus = 0\n     resources_assigned.vmem = 0kb\n     resv_enable = True\n     sharing = force_exclhost\n     license = l\n     last_state_change_time = Tue Nov 15 19:26:39 2022\n     last_used_time = Tue Nov 15 19:26:39 2022\n     server_instance_id = polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov:15001\n```bash\n`pbsnodes -avSj`: A nice table to see what is free and in use\n\n```bash\n> pbsnodes -avSj\n                                                        mem       ncpus   nmics   ngpus\nvnode           state           njobs   run   susp      f/t        f/t     f/t     f/t   jobs\n--------------- --------------- ------ ----- ------ ------------ ------- ------- ------- -------\nx3014c0s19b0n0  job-exclusive        1     1      0  503gb/503gb   63/64     0/0     4/4 353394\nx3014c0s19b1n0  resv-exclusive       0     0      0  503gb/503gb    0/64     0/0     4/4 --\nx3014c0s1b0n0   offline              0     0      0  503gb/503gb   64/64     0/0     4/4 --\n

              pbsnodes -avSj | grep free | wc -l: A quick way to see how many nodes are free

              [20220217-21:09:30]> pbsnodes -avSj | grep free | wc -l\n38\n

              pbsnodes -avSj | grep free | awk '{print $1}': Lists the free nodes

              [20220217-21:09:30]> pbsnodes -avSj | grep free | awk '{print $1}'\nx3201c0s25b0n0\nx3209c0s13b0n0\nx3209c0s19b0n0\nx3209c0s1b1n0\n

              pbsnodes -l: (lowercase l) see which nodes are down. The comment often indicates why it is down

              [20220217-21:10:31]> pbsnodes -l\nx3014c0s19b0n0       offline,resv-exclusive Xid 74 -- GPUs need reseat\nx3014c0s25b0n0       offline,resv-exclusive Checking on ConnectX-5 firmware\n
              "},{"location":"running-jobs/job-and-queue-scheduling/#job-priority","title":"Job Priority","text":"

              In PBS it is not easy to see a priority order for which jobs will run next. The best way is to use the -T option on qsub and look at the estimated start times. ALCF runs a custom scheduler algorithm, but in general, the job priority in the queue is based on several criteria:

              1. positive balance of your project
              2. size (in nodes) of the job, larger jobs receive higher priority
              3. the type of project (e.g. INCITE, ALCC, or discretionary)
              4. job duration: shorter duration jobs will accumulate priority more quickly, so it is best to specify the job run time as accurately as possible
              "},{"location":"running-jobs/job-and-queue-scheduling/#troubleshooting-common-errors","title":"Troubleshooting / Common Errors","text":"

              If you receive a qsub: Job rejected by all possible destinations error, then check your submission parameters. The issue is most likely that your walltime or node count do not fall within the ranges listed above for the production execution queues. Please see the table above for limits on production queue job sizes.

              NOTE: For batch submissions, if the parameters within your submission script do not meet the parameters of any of the above queues you might not receive the \"Job submission\" error on the command line at all. This can happen because your job is in waiting in a routing queue and has not yet reached the execution queues. In this case you will receive a jobid back and qsub will exit, however when the proposed job is routed, it will be rejected from the execution queues. In that case, the job will be deleted from the system and will not show up in the job history for that system. If you run a qstat on the jobid, it will return qstat: Unknown Job Id <jobid>.

              "},{"location":"running-jobs/job-and-queue-scheduling/#using-fakeroot-with-singularity","title":"Using Fakeroot with Singularity","text":"

              The fakeroot feature (commonly referred as rootless mode) allows an unprivileged user to run a container as a \u201cfake root\u201d user by leveraging user namespace UID/GID mapping. To request this feature be enabled on your job add the following to your qsub command line:

              -l singularity_fakeroot=true

              "},{"location":"running-jobs/machine-reservations-polaris/","title":"Machine Reservations on Polaris","text":"

              To get a reservation, you must first demonstrate a need to run outside of the normal queueing policies. Reservations are available only to projects with a positive allocation. Lead time for approval is 5 business days. If approved, scheduling is contingent on machine availability.

              Disclaimer: Approval for reservation requests are subject to their appropriateness and machine availability. Not all requests will be approved. It is particularly difficult to accommodate reservation requests during busy times of the year, e.g. Supercomputing, end of the ALCC and INCITE allocation cycles.

              To request a reservation, e-mail support@alcf.anl.gov with the requested information below.

              1. RESERVATION REQUEST FOR ALL SYSTEMS (including vis clusters) AT ALCF Machine name:
              2. Project for reservation:
              3. ALCF account username(s) (NOT the user's legal name) for reservation:

                NOTE: We can only gate a reservation on an explicit list of users or a list of groups, we can\u2019t mix them both. So users must specify either a project/unixgroup name or a list of usernames, not both.

              4. Length of reservation:

              5. Earliest date you could start:
              6. Deadline for the run(s),
              7. Details on the Run: Can it run anytime, day or night?
              8. Your local time zone (e.g., US/Central):
              9. Total number of jobs to be run:
              10. Total amount of data generated during reservation:
              11. For each job, indicate: Node count (Note: not processor count)
                1. Run time whether this job depends on any other jobs to finish before it can start
                2. Briefly describe the goals for this run: (Example: We are doing a scaling run of code XXXX to determine YYYY)
                3. Please provide a detailed explanation of why this workload cannot be accomplished with the existing queues: (Requests omitting this response will be not be processed)
                4. After a reservation is granted, you will receive a reservation name by e-mail. Use the command pbs_rstat to verify the reservation attributes.

              For example:

              pbs_rstat\nResv ID      Queue     User     State               Start / Duration / End             \n---------------------------------------------------------------------------\nA123456.po   A123456   smith@   CO       Mon Aug 18 09:00 / 43200 / Tue Aug 19 11:00\n\nqsub -q A123456 walltime=60:00 -l select=1024:system=polaris -l filesystems=eagle myprog.exe\n

              Once the reservation is set up, jobs can be submitted to the reservation queue prior to the reservation start time.

              For recurring reservations, the reserve_start and reserve_end are always the first instance. reserve_index and reserve_count tell you where you are in the recurrence.

              For jobs using 33 percent or more of a system, place your job in the queue at least 12 hours prior to the start of the reservation or your reservation may be canceled. The machine will start to drain for your reservation, and it is important that your job is ready to run.

              You can also move jobs from the regular queue to the reservation queue at any time using the \u201cqmove\u201d command. Keep in mind that a job won't start unless enough time is left in the reservation.

              NOTE: There is NOT a 10-minute pad at the end of the reservation. When the reservation ends all jobs are terminated, deleted, and the reservation queue is deleted. If a routing queue is used for the reservation, then jobs may be preserved, but any running job(s) are still terminated.

              If you have finished running your jobs before your reservation has ended, please reach out to the support team to have to release it for other users. At this time, there is no way for a user to release a reservation early.

              "},{"location":"running-jobs/pbs-qsub-options-table/","title":"PBS Pro qsub Options","text":"

              Version 1.2 2021-04-28

              -l select and similar options use a lower case \"L\", -I for interactive is an upper case \"I\"

              Cobalt CLI PBS CLI PBS Directive Function and Page Reference -A <account_string> -A <account_string> #PBS Account_Name=<accounting string> \"Specifying Accounting String\u201d UG-29 -n NODES--nodecount NODES -l select=NODES:system=<hostname> One or more #PBS -l <resource name>=<value> directives \"Requesting Resources\u201d UG-51 -t --walltime -l walltime=H:MM:SS One or more #PBS -l <resource name>=<value> directives \"Requesting Resources\u201d UG-51 --attrs filesystems=<resouce> -l filesystems=<resource> One or more #PBS -l <resource name>=<value> directives \"Requesting Resources\u201d UG-51 -q -q <destination> #PBS -q <queue name> #PBS -q @<server name> #PBS -q <queue name>@<server name> \"Specifying Server and/or Queue\u201d UG-29 --env -v <variable list> \"Exporting Specific Environment Variables\u201d UG-126 --env -V #PBS -V \"Exporting All Environment Variables\u201d UG-126 --attrs Done via custom resources and select statements \"Setting Job Attributes\u201d UG-16 --dependencies=<list> -W depend=afterok:<list> #PBS depend=... \"Using Job Dependencies\u201d UG-107 -I--interactive -I Deprecated for use in a script \"Running Your Job Interactively\u201d UG-121 --jobname -N <name> #PBS -N <job name> #PBS -WJob_Name=<job name> \"Specifying Job Name\u201d UG-27 -e--error= -e <path> #PBS -e <path>#PBS Error_Path=<path> \"Paths for Output and Error Files\u201d UG-42 -o--output= -o <path> #PBS -o <path>#PBS Output_Path=<path> \"Paths for Output and Error Files\u201d UG-42 -M--notify see note #1 -M <user list> -m <mail options> (-m be is suggested) #PBS -M <mail recipients> #PBS -WMail_Users=<mail recipients> #PBS -m <mail points> #PBS -WMail_Points=<mail points> \"Setting Email Recipient List\u201d UG-26 -u--umask -W umask=<value> #PBS umask=<value> \"Changing Linux Job umask\u201d UG-45 -h -h #PBS -h \"Holding and Releasing Jobs\u201d UG-115 --proccount See Note #2 -l mpiprocsNot needed to get equivalent Cobalt functionality One or more #PBS -l <resource name>=<value> directives \"Requesting Resources\u201d UG-51"},{"location":"running-jobs/pbs-qsub-options-table/#pbs-options-that-provide-functionality-above-and-beyond-cobalt","title":"PBS options that provide functionality above and beyond Cobalt","text":"

              Depending on policy decisions not all of these options may be available.

              Cobalt CLI PBS CLI PBS Directive Function and Page Reference N/A -a <date_time> #PBS -a \"Deferring Execution\u201d UG-119 N/A -C <directive prefix> \"Changing the Directive Prefix\u201d UG-16 N/A -c <interval> #PBS -c \"Using Checkpointing\u201d UG-113 N/A -G \"Submitting Interactive GUI Jobs on Windows\u201d UG-125 N/A -J X-Y[:Z] #PBS -J \"Submitting a Job Array\u201d UG-150 N/A -j <join> #PBS Join_Path=<joining option> \"Merging Output and Error Files\u201d UG-43 N/A -k <keep> #PBS Keep_Files=<keep option> \"Keeping Output and Error Files on Execution Host\u201d UG-44 N/A -p <priority> #PBS -p \"Setting Priority for Your Job\u201d UG-120 N/A -P <project> #PBS project=<project name> \"Specifying a Project for a Job\u201d UG-27 N/A -r <value> #PBS -r \"Allowing Your Job to be Re-run\u201d UG-118 N/A -R <remove options> \"Avoiding Creation of stdout and/or stderr\u201d UG-43 N/A -S <path list> \"Specifying the Top Shell for Your Job\u201d UG-19 N/A See Note #3 -u <user list> #PBS User_List=<username list> \"Specifying Job Username\u201d UG-28 N/A -W block=true #PBS block=true \"Making qsub Wait Until Job Ends\u201d UG-120 N/A -W group_list=<list> #PBS group_list=<group list> \"Specifying Job Group ID\u201d UG-28 N/A -W release_nodes_on_stageout=<value> \"Releasing Unneeded Vnodes from Your Job\u201d UG-127 N/A -W run_count=<value> \"Controlling Number of Times Job is Re-run\u201d UG-119 N/A -W sandbox=<value> \"Staging and Execution Directory: User Home vs. Job-specific\u201d UG-31 N/A -W stagein=<list> #PBS -W stagein=<execution path>@<input file storage host>:<input file storage path>[,...] \"Input/Output File Staging\u201d UG-31 N/A -W stageout=<list> #PBS -W stageout=<execution path>@<output file storage host>:<output file storage path>[,...] \"Input/Output File Staging\u201d UG-31 N/A -X \"Receiving X Output from Interactive Linux Jobs\u201d UG-124 N/A -z #PBS -z \"Suppressing Printing Job Identifier to stdout\u201d UG-30"},{"location":"running-jobs/pbs-qsub-options-table/#notes","title":"Notes","text":"
              1. To get the equivalent mail notifications from PBS it requires two parameters: the -M just like Cobalt, but also -m be (the be stands for \"beginning\" and \"end\") to specify when the mails should go out. This will give you the same behavior as Cobalt.
              2. --proccount, while available, only changed behavior on the Blue Gene machines. To get equivalent functionality just drop it from the CLI. In PBS it does influence the PBS_NODES file. See Section 5.1.3 in the PBS Users Guide page UG-78
              3. The following Cobalt options have no equivalent in PBS:
                • --cwd: use a script and cd to the directory you want to run from.
                • --user_list: There is no way to do this. We will work on adding this functionality.
                • --debuglog: Are we going to try and generate the equivalent of a .cobalt file?
              4. The following Cobalt options were Blue Gene specific and no longer apply:
                • --kernel
                • -K KERNELOPTIONS
                • --ion_kernel
                • --ion_kerneloption
                • --mode: see notes on running scripts, Python, and other executables
                • --geometry
                • --disable_preboot
              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/","title":"PBS Admin Quick Start Guide","text":"

              The single most important thing I can tell you is where to get the PBS BigBook. It is very good and a search will usually get you what you need if it isn't in here.

              • PBS Admin Quick Start Guide
              • Checking Server Status
              • Checking / Setting Node Status
              • Troubleshooting
              • Starting, stopping, restarting, status of the daemons:
              • Starting, stopping scheduling across the entire complex
              • Starting, stopping queues:
              • \"Boosting\" jobs (running them sooner)
              • Reservations
              • MIG Mode
              • Rack and Dragonfly group mappings
              • Restricting a Reservation to Vnodes With Specific Resources
              • Removing Blocking Resources
              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#checking-server-status","title":"Checking Server Status","text":"

              You can check overall server status and settings with: qmgr -c \"list server\" or qstat -Bf (add -w to qstat if you want to remove wrapping) This will show current server parameters. If you have manager/operator permissions you will also see any hidden resources. You may also check parameters of the scheduler with qmgr -c \"list sched\", and by checking $PBS_HOME/sched_priv/sched_config. Hook information can be checked with qmgr -c \"list hook\" and qmgr -c \"list pbshook\". Due to permissions all hook operations require root.

              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#checking-setting-node-status","title":"Checking / Setting Node Status","text":"

              The pbsnodes command is your friend.

              • check status
              • pbsnodes -av gives you everything; grep will be useful here
              • pbsnodes -v <node> <node> ... will give you all information on the listed nodes
              • pbsnodes -avSj gives you a nice table summary
              • pbsnodes -l lists the nodes that are offline
              • Taking nodes on and offline
              • pbsnodes -C <comment> -o <nodelist> will mark a node offline in PBS (unschedulable)
                • Adding the time and date and why you took it offline in the comment is helpful
                • <nodelist> is space separated
              • pbsnodes -r <node list> will attempt to bring a node back online This will only remove the \"offline\" state from a node, if the node would be down for other reasons, that will not change. * Use -C \"\" to remove any comment that was set when the node was originally marked offline.
              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#troubleshooting","title":"Troubleshooting","text":"
              • PBS_EXEC (where all the executables are): /opt/pbs/[bin|sbin]
              • PBS_HOME (where all the data is): /var/spool/pbs
              • logs: /var/spool/pbs/[server|mom|sched|comm]_logs
              • config: /var/spool/pbs/[server|mom|sched]_priv/
              • /etc/pbs.conf - Reference Guide Section 9.1, page RG-371
              • qstat -[x]f [jobid]
              • the -x shows jobs that have already completed. We are currently holding two weeks history.
              • the comment field is particularly useful. It will tell you why it failed, got held, couldn't run, etc..
              • The jobid is optional. Without it you get all jobs.
              • tracejob <jobid>
              • This will pull all of the logs related to the jobid on that node. Run on the pbs.server host to get most of the job information
              • If this is run on a compute node involved in jobid then it will aggregate all logs from the mom on that job from that node.
              • You may pass it the -n # option where # is number of days to look back to tell the command to search more days back in the logs. This defaults to 1 day.
              • This does a rudimentary aggregation and filter of the logs for you.
              • qselect - Reference Guide Section 2.54 page RG-187.
              • allows you to query and return jobids that meet criteria for instance the command below would delete all the jobs from Yankee Doodle Dandy, username yddandy:
              • qdel `qselect -u yddandy`
              • Error Code Table (Reference Guide Chapter 14, RG-391)
              • If a CLI command (qmgr, qsub, whatever) spits out an error code at you, go look it up in the table, you may well save yourself a good bit of time.
              • We are going to try and either get the error text to come with the code or write a utility to look it up and have that on all the systems.
              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#starting-stopping-restarting-status-of-the-daemons","title":"Starting, stopping, restarting, status of the daemons:","text":"
              • Server: on pbs0 run systemctl [start | stop |restart | status] pbs
              • MoM:
              • If you only want to restart a single MoM, ssh to the host and issue the same commands as above for ther server.
              • If you want to restart the MoM on every compute node, ssh admin.polaris then do: pdsh -g custom-compute \"systemctl [start | stop |restart | status] pbs\"
              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#starting-stopping-scheduling-across-the-entire-complex","title":"Starting, stopping scheduling across the entire complex","text":"

              qmgr -c \"set server scheduling = [True | False]\"

              IMPORTANT NOTE: If we are running a single PBS complex for all our systems (same server is handling Polaris, Aurora, Cooley2, etc) this will stop scheduling on everything.

              To check the current status you may do: qmgr -c \"list server scheduling\"

              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#starting-stopping-queues","title":"Starting, stopping queues:","text":"
              • started: Can you queue a job or not
              • enabled: Will the scheduler run jobs that are in the queue

              So if a queue is started, but not enabled, users can issue qsubs and the job will get queued, but nothing will run until we renable the queue. Running jobs are unaffected.

              qmgr -c \"set queue <queue name> started = [True | False]\" qmgr -c \"set queue <queue name> enabled = [True | False]\"

              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#boosting-jobs-running-them-sooner","title":"\"Boosting\" jobs (running them sooner)","text":"

              There are two ways you can run a job sooner:

              1. qmove run_next <jobid>
                1. Because of the way policy is set for the acceptance testing period, any job in the run_next queue will run before jobs in the default workq with the exception of jobs that are backfilled. So by moving the job into the run_next queue, you moved it to the front of the line. There are no restrictions on this, so please do not abuse it.
              2. qorder <jobid> <jobid>
              3. If you don't necessarily need it to run next, but just want to rearrange the order a bit, you can use qorder which swaps the positions of the specified jobids. So, if one of them was 10th in line and one was 20th, they would switch positions.
              4. qalter -l score_boost=NNNNN <jobid> <jobid> If the job_sort_function is enabled and shows up when querying the server, you can add a numeric boost to the score of a job to push it further ahead in the queue. You have to be a manager or operator to alter this value.
              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#reservations","title":"Reservations","text":"

              Most of the reservation commands are similar to the job commands, but prefixed with pbs_r instead of q: pbs_rsub, pbs_rstat, pbs_ralter, pbs_rdel. You get the picture. In general, their behavior is reasonably similar to the equivalent jobs commands. Note that by default, users can set their own reservations. We have to use a hook, no_user_rsub, to prevent that. The hook does allow anyone with manager or operator permissions to set reservations.

              • There are three types of reservations:
              • Advance and standing reservations - reservations for users; Note that you typically don't specify the nodes. You do a resource request like with qsub and PBS will find the nodes for you.
              • job-specific now reservations - we have not used these. Where they could come in handy is for debugging. A user gets a job through, we convert it to a job-specific reservation, then if their job dies, they don't have to wait through the queue again, they can keep iterating until the wall time runs out.
              • maintenance reservations. - You can explicitly set which hosts to include in the reservation.
              • Also note that reservations occur in two steps. The pbs_rsub will return with an ID but will say unconfirmed. That means it was syntactically correct, but PBS hasn't figured out if the resources are available yet. Once it has the resources, it will switch to confirmed. This normally is done as fast as you can run pbs_rstat. A reservation can only be confirmed if scheduling is enabled on the server.
              • -R (start) -E (end) are in \"datetime\" format: [[[[CC]YY]MM]DD]hhmm[.SS]
              • 1315, 171315, 12171315, 2112171315 and 202112171315 would all be Dec 17th, 2021 @ 13:15
                • If that is in the future they are all equivalent and valid
                • If it were Dec 17th, 2021 @ 1400, then 1315 would default to the next day @ 14:00, the rest would be errors because they are in the past.
                • Be careful or this will bite you. It will confirm the reservation and you will expect it to start in a few minutes, but it is actually for tomorrow.
              • pbs_rsub -N rsub_test -R 2023 -D 05:00 -l select=4
              • probably not what you think: resv_nodes = (edtb-03[0]:ncpus=1)+(edtb-03[0]:ncpus=1)+(edtb-03[0]:ncpus=1)+(edtb-03[0]:ncpus=1) It gave me 4 cores on the same node.
              • pbs_rsub -N rsub_test -R 2023 -D 05:00 -l select=2 -l place=scatter
              • Getting closer: resv_nodes = (edtb-01[0]:ncpus=1)+(edtb-02[0]:ncpus=1)
              • The -l place=scatter got me two different nodes, but edtb allows sharing, so I got one thread on each node, but there were actually jobs running on those nodes at the time. On Polaris, since the nodes are force_exclhost that wouldn't have been an issue.
              • pbs_rsub -N rsub_test -R 2217 -D 05:00 -l select=2:ncpus=64 -l place=scatter:excl This gave me what I wanted:
                • resv_nodes = (edtb-03[0]:ncpus=64)+(edtb-04[0]:ncpus=64)
                • Leaving it to default to ncpus=1 should work, but asking for them all isn't a bad idea.
              • pbs_rsub -N rsub_test -R 1200 -D 05:00 --hosts x3004c0s1b0n0 x3003c0s25b0n0...
              • If you use --hosts it makes it a maintenance reservation. You can't / don't need to add -l select or -l place on a maintenance reservation. PBS will set it for you and will make it the entire host and exclusive access. Nodes don't have to be up. If jobs are running they will continue to run. This will override any other reservation.
              • pbs_ralter You can use this to change attributes of the reservation (start time, end time, how many nodes, which users can access it, etc). Works just like qalter for jobs.
              • pbs_rdel <reservation id> This will kill all running jobs, delete the queue, meaning you lose any jobs that were in the queue, and release all the resources.
              • NOTE: once the reservation queue is in place, you use all the normal jobs commands (qsub, qalter, qdel, etc.) to manipulate the jobs in the queue. On the qsub you have to add -q <reservation queue name>
              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#giving-users-access-to-the-reservation","title":"Giving users access to the reservation","text":"

              By default, only the person submitting the reservation will be able to submit jobs to the reservation queue. You change this with the -U +username@*,+username@*,.... You can add this to the initial pbs_rsub or use pbs_ralter after the fact. The plus is basically ALLOW. We haven't tested it, but you can also theoretically use a minus for DENY. You may also gate on group membership by setting qmgr -c \"set queue <reservation queue name> acl_group_enable=True\" and then adding groups to acl_groups on the reservation queue, using the same sort of syntax as you use for acl_users. This is a bit of a hack, but if you want anyone to be able to run you can do qmgr -c \"set queue <reservation queue name> acl_user_enable=False\"

              WARNING: if you have both acl_users and acl_groups enabled, then the submitting user must be in the group and the user ACL list otherwise the job will be rejected! It is recommended that only one or the other be used on a queue.

              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#mig-mode","title":"MIG mode","text":"
              • See the Nvidia Multi-Instance GPU User Guide for more details.
              • sudo nvidia-smi mig -lgip List GPU Instance Profiles; This is how you find the magic numbers used to configure it below.
              • sudo nvidia-smi mig -lgipp list all the possible placements; The syntax of the placement is {<index>}:<GPU Slice Count>
              • nvidia-smi --query-gpu=mig.mode.current --format=csv,noheader - check the status of all the GPUs on the node; add -i <GPU number> to check a specific GPU
              • systemctl stop nvidia-dcgm.service ; systemctl stop nvsm ; sleep 5 ; /usr/bin/nvidia-smi -mig 1 Put the node in MIG mode; -mig 0 will take it out of MIG mode.
              • nvidia-smi mig -i 3 -cgi 19,19,19,19,19,19,19 -C configure GPU #3 to have 7 instances.
              • nvidia-smi mig --destroy-compute-instance; nvidia-smi mig --destroy-gpu-instance Will free up the resources; You have to do this before you can change the configuration.
              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#polaris-rack-and-dragonfly-group-mappings","title":"Polaris Rack and Dragonfly group mappings","text":"
              • Racks contain (7) 6U chassis; Each chassis has 2 nodes for 14 nodes per rack
              • The hostnames are of the form xRRPPc0sUUb[0|10]n0 where:
                • RR is the row {30, 31, 32}
                • PP is the position in the row {30 goes 01-16, 31 and 32 go 01-12}
                • c is chassis and is always 0 (I wish they would have counted up chasses, oh well)
                • s stands for slot, but in this case is the RU in the rack. Values are {1,7,13,19,25,31,37}
                • b is BMC controller and is 0 or 1 (each node has its own BMC)
                • n is node, but is always 0 since there is only one node per BMC
              • So, 16+12+12 = 40 racks * 14 nodes per rack = 560 nodes.
              • Note that in production group 9 (the last 4 racks) will be the designated on-demand racks
              • The management racks are x3000 and X3100 and are dragonfly group 10
              • The TDS rack is x3200 and is dragonfly group 11
              Group 0 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8 Group 9 x3001-g0 x3005-g1 x3009-g2 x3013-g3 x3101-g4 x3105-g5 x3109-g6 x3201-g7 x3205-g8 x3209-g9 x3002-g0 x3006-g1 x3010-g2 x3014-g3 x3102-g4 x3106-g5 x3110-g6 x3202-g7 x3206-g8 x3210-g9 x3003-g0 x3007-g1 x3011-g2 x3015-g3 x3103-g4 x3107-g5 x3111-g6 x3203-g7 x3207-g8 x3211-g9 x3004-g0 x3008-g1 x3012-g2 x3016-g3 x3104-g4 x3108-g5 x3112-g6 x3204-g7 x3208-g8 x3212-g9"},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#restricting-a-reservation-to-vnodes-with-specific-resources","title":"Restricting a Reservation to Vnodes With Specific Resources","text":"

              You can restrict a reservation to particular resources in the select statement just like you can with job placement. For instance, to restrict replacement to nodes that are not in the on-demand queue you can use -l select=256:demand=False in your select statement for a regular or repeating reservation.

              "},{"location":"running-jobs/unused/pbs-admin-quick-start-guide/#removing-blocking-resources","title":"Removing Blocking Resources","text":"

              There is a current behavior in PBS where reservations may inherit server defaults as restrictions and may not check other server values. This may result in jobs running unexpectedly, or may cause a job to not be queued.

              To fix jobs not being queued, some resources_max restrictions may have to be removed from the reservation queue, for example, you can clear filesystems and project_priority with the following: gmgr -c \"unset queue <reservation queue name> resources_max.filesystems\" gmgr -c \"unset queue <reservation queue name> resources_max.project_priority\"

              If you need to add an additional restriction, you can likewise set a resource on the queue as a resources_max restrictions, for instance, to forbid eagle_fs from being used you can run: qmgr -c \"set queue <reservation queue name> resources_max.eagle_fs=False\" qmgr -c \"set queue <reservation queue name> resources_mix.eagle_fs=False\"

              You can also set this as a part of the -l flag options at reservation creation.

              "},{"location":"services/continuous-integration/","title":"Continuous Integration on Theta","text":""},{"location":"services/continuous-integration/#continuous-integration","title":"Continuous Integration","text":"

              Continuous Integration (CI) in software development is the practice of committing code changes regularly to a version control system and having automated processes perform build, test, package, and deploy activities.

              The key concepts of CI include high frequency, repeatability, and automation in order to realize increased quality and ease of delivery. The main goal CI aims to achieve is the elimination of build and deployment issues, which in turn improves development cycles, provides a timely feedback loop with developers, and results in higher quality deliverables with reduced development time.

              CI usually describes the work that is done by a deployment or operations team to build and deploy code throughout an environment and make it available to the different interested teams involved in the SDLC. The steps that make up this process are referred to as a workflow or pipeline, which, when combined with automation, provides the mechanism for Continuous Integration.

              Today it is a common practice to use a CI tool for defining pipelines and executing the tasks required to take code from a source stored in a version control system to compiled and packaged artifacts executing in production. Two excellent examples of CI tools are Jenkins and GitLab.

              "},{"location":"services/continuous-integration/#ci-tools-at-alcf","title":"CI Tools at ALCF","text":""},{"location":"services/continuous-integration/#jenkins","title":"Jenkins","text":"

              Jenkins \"is a self-contained, open-source automation server which can be used to automate all sorts of tasks relating to building, testing, and delivering or deploying software.\"

              "},{"location":"services/continuous-integration/#gitlab-ci","title":"Gitlab-CI","text":"

              Gitlab is an application that offers combined functionality as git repository, issue tracker, and CI/CD platform. The ALCF implementation of the Gitlab-CI environment leverages upstream gitlab runners combined with the ECP's Jacamar custom executor. As CI/CD is built directly into Gitlab, it can allow for tighter devops processes. Gitlab-CI is meant to provide CI/CD services for projects using Gitlab-CI to store their git repositories. ALCF does not allow users to join their own private runners to our existing gitlab ci environment and provides runners on our supported systems.

              "},{"location":"services/getting-started/","title":"ALCF Services","text":"

              Below is a list of some of the services ALCF offers.

              • JupyterHub: An interactive computing environment for Python and other languages.
              • Continuous Integration: An automated processes to help build, test, package, and deploy on ALCF systems.
              "},{"location":"services/gitlab-ci/","title":"Continuous Integration via GitLab-CI","text":""},{"location":"services/gitlab-ci/#gitlab-ci","title":"GitLab-CI","text":"

              GitLab is an application that offers combined functionality as git repository, issue tracker, and CI/CD platform. The ALCF implementation of the GitLab-CI environment leverages upstream GitLab runners combined with the ECP's Jacamar custom executor. As CI/CD is built directly into GitLab, it can allow for tighter devops processes.

              GitLab-CI is meant to provide CI/CD services for projects using GitLab-CI to store your git repositories and executing code on our HPC clusters. ALCF does not allow users to join your own private runners to our existing GitLab CI/CD environment and provides dedicated runners for our supported systems.

              Additional information, technical and user documentation, and community support can be found on the GitLab's Runner website.

              ALCF's GitLab-CI environment can be accessed by logging into the ALCF GitLab-CI web portal using your ALCF credentials (ALCF username and cryptocard token password).

              "},{"location":"services/gitlab-ci/#quickstart","title":"Quickstart","text":"
              • A user Emails ALCF Support requesting access for your ALCF Project for gitlab-ci.alcf.anl.gov .
              • ALCF Support will add the ALCF Project to the appropriate system(s) via the Account and Project management system.
              • ALCF will create a GitLab Group/SubGroup for the ALCF Project and map it to the appropriate ldap group that maps to the ALCF Project
              • ALCF Support will reply back to the user and inform them that the project is created.
              • User(s) will need to login to gitlab-ci.alcf.anl.gov and configure their initial GitLab profile. Users will add an SSH key so they can pull/push code to the gitlab server.
              • User will then need to create a GitLab Project in your assigned GitLab Group/SubGroup.
              • When ready to run CI/CD jobs, users will add add a .gitlab-ci.yml file to your git repositories.
              • They will need to set any ALCF specific variable(s).

              Example: A .gitlab-ci.yml file for a Theta project

              variables:\n  ANL_THETA_SCHEDULER_PARAMETERS: \"-A ProjectName -n 1  -t 10 -q ThetaQueueName --attrs filesystems=home\"\nstages:\n  - stage1\n  - stage2\n  - stage3\nshell_test1:\n  stage: stage1\n  tags:\n    - ecp-theta\n    - shell\n  script:\n    - echo \"Shell Job 1\"\nbatch_test:\n  stage: stage2\n  tags:\n    - ecp-theta\n    - batch\n  script:\n    - echo \"Job 2 start\"\n    - aprun -n 1 id\n    - aprun -n 1 hostname\n    - aprun -n 1 echo \"Running on theta with setuid batch runner\"\n    - echo \"Job end\"\n

              "},{"location":"services/gitlab-ci/#glossary","title":"Glossary","text":"
              • Group - A collection of projects. Certain settings can be applied at the Group level and apply down to all child SubGroups and/or Projects. When a ALCF Project is allocated resources on the GitLab-CI environment we will create a GitLab Group that will map to your ALCF Project allocation.
              • Jacamar-CI - A Custom Executor we use that runs jobs as a given user on the shell and is capable of submitting jobs to schedulers like Cobalt and PBS.
              • Job - An individual set of commands that are ran. This is the lowest unit of GitLab-CI abstraction.
              • Pipeline - GitLab organizes your jobs for each run into a pipeline.
              • Project - GitLab Projects can be thought of as an individual git repository plus all services and features GitLab layers on top. This term is unrelated to the ALCF Project concept. ALCF Projects often map to ldap groups and/or quotas and allocations.
              • Stage - A collection of jobs in a pipeline. Jobs in the next stage will not start till the jobs in the current stage complete. If a job fails, the pipeline will not run the following stages by default.
              • Triggering User - The user whose actions causes a CI/CD job to run and who the Jacamar-CI executor will run the jobs as. Examples include pushing commits up to the server, creating a merge request, and/or merging one branch into another branch.
              "},{"location":"services/gitlab-ci/#projects-using-cicd","title":"Projects Using CI/CD","text":"

              Any project with a git repository on the GitLab-CI environment has access to the CI/CD environment by default. In order to launch a shell job on a system you must already have access to that system.

              "},{"location":"services/gitlab-ci/#on-boarding-with-cicd","title":"On-Boarding with CI/CD","text":"

              To gain access to the GitLab-CI environment, send an email to support@alcf.anl.gov requesting access for your project(s). Include with the request:

              • That you are requesting access to the GitLab-CI environment at https://gitlab-ci.alcf.anl.gov
              • The ALCF Project shortname
              • The PI\u2019s name

              GitLab-CI jobs run as the triggering user on relevant systems. The triggering user's home directory will be used by Jacamar-CI to copy the git repository and cache files into ~/.jacamar-ci. This job will run out of their home directory and consume filesystem quota. If you need more space you should try to reference files in any ALCF Project allocations you have on shared filesystems. Unfortunately the initial git clone must run out of ~/.jacamar-ci in your home directory.

              The triggering user is defined as the user account who caused the CI/CD pipeline to execute. Via scheduling a re-occurring job, pushing commits up to the server, creating a merge request, and/or merging a branch. When the CI/CD jobs run they will run as that user on the relevant systems. For a job to succeed the triggering user must have appropriate permissions and access to all relevant systems and files.

              "},{"location":"services/gitlab-ci/#initial-login-and-profile-setup-of-gitlab-ci","title":"Initial Login and Profile setup of GitLab-CI","text":"
              • Login to gitlab-ci.alcf.anl.gov using your username and Cryptocard token.
              • Once logged in, add your public key you already have or created earlier so that it can be associated with your account.
              • Click Profile icon on the upper right hand corner, then click \"Edit Profile\" GitLab Profile Dropdown screenshot
              • Click \"SSH Keys\" on the left hand menu. GitLab Profile Add SSH Key screenshot
              • Copy/Paste in your SSH public key into the large text box under the word Key
                • On Linux, Unix, and OSX based systems using OpenSSH your SSH public key is commonly found at ~/.ssh/id_rsa.pub. If using windows you will need to consult your applications documentation on the location of your public key.
                • Give it a descriptive title such as the where the key resides, by default it will extract the name from the end of the public key if possible.
              • Click the Add Key button. The button is disabled until you paste a key.
              "},{"location":"services/gitlab-ci/#gitlab-projects-repositories","title":"GitLab Projects (repositories)","text":"

              GitLab takes a git repository, adds additional functionality, and calls it a GitLab Project. This is the most common level you will be interacting with GitLab at. Please do not confuse ALCF Projects with GitLab Projects as they are two separate things. ALCF Projects more closely map to the GitLab Group/SubGroup concept; which we explain in the next section. Once you are assigned access to a GitLab Group/SubGroup you will be able to create arbitrary GitLab Projects underneath. Configuring CI/CD jobs for each independently.

              To create a new GitLab Project:

              • In the left pane, click \"Groups\", and then click \"Explore groups\" link on the right.

              GitLab Your Groups Page screenshot
              • From the list in the \"Explore groups\" page, click the group you were informed corresponds to your ALCF Project

              GitLab Explore Groups Page screenshot
              • Click the New project button near the upper right. If this is the first project you are creating you will have two large square buttons near the middle of the screen to create GitLab SubGroups or GitLab Projects

              GitLab Empty Group Page screenshot
              • On the Create new project page, click Create blank project

              GitLab Create New Project screenshot
              • Fill in the Project Name field. The Project slug field will auto populate based on the Project Name, do not change it. If you are pushing an existing repository, you MUST uncheck the default Initialize repository with a README option. Failure to uncheck this option will result in a merge conflict that you will need to resolve manually between your existing \"local\" git repository and the one you just created on the server.

              GitLab Create New Project screenshot
              • Click Create project button near the bottom
              "},{"location":"services/gitlab-ci/#gitlab-groupssubgroups-folders","title":"GitLab Groups/SubGroups (Folders)","text":"

              GitLab organizes GitLab Projects into \"folders\" called Groups or SubGroups. When an ALCF Project is granted access to GitLab-CI a GitLab Group will be created with access for all members of that ALCF Project. Users will then be able to create arbitrary GitLab Projects.

              Each ALCF Project will have a top-level Group or SubGroup created with the ALCF Project\u2019s name. It is used for organization in the multi-project environment and is required for implementing the needed level of security. The Group folder is where all of the your GitLab Projects are to be stored, you can additionally create new SubGroups, Projects, group variables, etc within your designated Group, SubGroups, and/or Projects.

              To create a new GitLab SubGroup:

              • In the left pane, click \"Groups\", and then click \"Explore groups\" link on the right.

              GitLab Your Groups Page screenshot
              • From the list in the \"Explore groups\" page, click the group you were informed corresponds to your ALCF Project

              GitLab Explore Groups Page screenshot
              • Click the New subgroup button near the upper right. If this is the first project you are creating you will have two large square buttons near the middle of the screen to create GitLab SubGroups or GitLab Projects

              GitLab Empty Group Page screenshot
              • On the Create subgroup page, enter the Subgroup name. Subgroup slug will auto populate, do not change it.

              GitLab Create New SubGroup screenshot
              • Click Create subgroup button near the bottom
              "},{"location":"services/gitlab-ci/#gitlab-runner-nodes","title":"GitLab Runner Nodes","text":"

              Each system is assigned one or more GitLab runner node(s) that are shared by all users in GitLab-CI. Each runner is only capable of running one users pipeline at a time. While multiple jobs in that pipeline may run in parallel.

              Each node will have two runners available, shell and batch. shell will run shell jobs directly on the runner node as the user. batch will submit the job to the HPC cluster's scheduler that is paired to that node. You will need to select the appropriate runner in your .gitlab-ci.yml file for the job to be executed properly. For more details on the .gitlab-ci.yml file, please see upstream docs.

              "},{"location":"services/gitlab-ci/#gitlab-ciyml-configuration-sections","title":".gitlab-ci.yml Configuration Sections","text":"

              GitLab uses a per repository .gitlab-ci.yml file. On any commit, merge request, or merge gitlab will attempt to trigger a CI/CD pipeline based on the contents of this file. Within the .gitlab-ci.yml file you can limit jobs to only run under certain conditions. A common workflow is to have linting and validation happen on every commit to a non-master/non-main branch. Larger more complex tasks are then performed when that branch is merged back into master/main. All jobs launched on a given event are organized into a Pipeline. You can watch the progress of your pipeline via the CI/CD pipeline page for your Project.

              GitLab Group and Projects screenshot

              GitLab Group and Projects screenshot"},{"location":"services/gitlab-ci/#tags","title":"Tags","text":"

              Tags are used to select which runner a job will be sent to. Improper tags can prevent your job from running and result in a failed job.

              "},{"location":"services/gitlab-ci/#alcf-specific-tags","title":"ALCF Specific tags","text":"

              Two tags are necessary to run on our systems. One tag will select which cluster the jobs are sent to. The other will determine if the job is to be run locally on the gitlab runner host, or if it is to be submitted to a job scheduler on an HPC cluster.

              Cluster Tag(s)

              Cluster tag Description Theta ecp-theta This tag will send jobs to the Theta HPC runners ThetaGPU thetagpu This tag will send jobs to the ThetaGPU HPC runners Polaris polaris This tag will send jobs to the Polaris HPC runners

              Job Type Tag(s)

              tag Description shell This tag will execute the job locally on the gitlab runner host. batch This tag will submit the job to the HPC cluster's job scheduler."},{"location":"services/gitlab-ci/#variables","title":"Variables","text":"

              Variables can be stored two ways, inline in the .gitlab-ci.yml file or as a setting in the GitLab Group or Project itself. Variables are exported as environment variables by gitlab-runner for each job and can be used inside the .gitlab-ci.yml file.

              To set a variable directly in the .gitlab-ci.yml file, declare a variables: section with each VariableName: \"VariableValue\" being on its own line. variables: can be declared globally or in individual jobs.

              Example: Declaring variables

              variables:\n  GlobalVariable1: \"Global Value 1\"\n  GlobalVariable2: \"Global Value 2\"\n\njob:\n  variables:\n    LocalVariable: 'This is a local variable'\n  script:\n    - 'echo $LocalVariable'\n

              To store variables in the Group or Project settings, in the left side menu, click Settings>CI/CD. Expand the Variables option on the right side frame. You can then add variables by clicking Add variable.

              For for more details please set please see the upstream docs

              GitLab Group and Projects screenshot

              GitLab Group and Projects screenshot"},{"location":"services/gitlab-ci/#alcf-specific-variables","title":"ALCF Specific Variables","text":"

              If you are planning to submit jobs to a scheduler then you will need to specify a per system variable ANL_${CLUSTER}_SCHEDULER_PARAMETERS; where ${CLUSTER} is the name of the cluster. This variable will contain any command line flags you would need to submit jobs as if you were on the command line / scripting. Please consult the below table for more info.

              Cluster Scheduler Variable Name Support docs Theta Cobalt ANL_THETA_SCHEDULER_PARAMETERS Theta Job Queue and Scheduling ThetaGPU Cobalt ANL_THETAGPU_SCHEDULER_PARAMETERS ThetaGPU Job Queue and Scheduling Polaris PBS ANL_POLARIS_SCHEDULER_PARAMETERS Polaris Getting Started

              Example: Running a batch job on Theta HPC

              variables:\n ANL_THETA_SCHEDULER_PARAMETERS: \"-A ProjectName -n 1  -t 10 -q myQueue --attrs filesystems=home\"\n\nbatch_test:\n  tags:\n    - ecp-theta\n    - batch\n  script:\n    - echo \"Job start\"\n    - aprun -n 1 id\n    - aprun -n 1 hostname\n    - aprun -n 1 echo \"Running on theta with setuid batch runner\"\n    - echo \"Job end\"\n

              "},{"location":"services/gitlab-ci/#stages","title":"Stages","text":"

              Jobs can be organized into stages. Jobs in the next stage will not start until all dependencies in the previous stage have completed. This is often used if there building and testing steps required before code may be ran or packaged. These stages are assembled in a Pipeline, a directed graph of stages. By default GitLab includes the following stages executed in the below order :

              .pre\nbuild\ntest\ndeploy\n.post\n

              You may declare your own stages by first declaring a stages: array near the top of your .gitlab-ci.yml file. Stages will be processed in the order given in the array.

              Example: Declaring Stages

              stages:\n  - stage1\n  - stage2\n  - stage3\n

              Example: Theta pipeline with custom stages

              variables:\n  ANL_THETA_PROJECT_SERVICE_USER: \"ecpcisvc\"\n  ANL_THETA_SCHEDULER_PARAMETERS: \"-A Operations -n 1  -t 10 -q build --attrs filesystems=home\"\n\nstages:\n  - stage1\n  - stage2\n\ntest1:\n  stage: stage1\n  tags:\n    - ecp-theta\n    - shell\n  script:\n    - export\n    - id\n    - hostname\n    - echo \"Running on theta with setuid shell runner\" \n    - echo test > test.txt\ntest2:\n  stage: stage2\n  tags:\n    - ecp-theta\n    - batch\n  script:\n    - echo \"Job 2 start\"\n    - aprun -n 1 id\n    - aprun -n 1 hostname\n    - aprun -n 1 echo \"Running on theta with setuid batch runner\"\n    - echo \"Job 2 end\"\n

              "},{"location":"services/gitlab-ci/#rules","title":"Rules","text":"

              GitLab allows for CI/CD jobs to be launched only if certain conditions are met. GitLab sets a series of variables in addition to any the user explicitly sets when a job launches. A job can check these variables and choose to run or not based on the results. This is often used to ensure certain jobs only run on commits, merge requests, and/or merges. By default if any rule matches it will run. You can override this behavior with commands like when: never when a conditional matches.

              For more details please set please see the upstream docs

              Rules can use the following conditional checks:

              if\nchanges\nexists\nallow_failure\nvariables\nwhen\n

              Example: GitLab job designed to only run on merge requests

              test1:\n  rules:\n    - if: $CI_COMMIT_TAG                    # Do not execute jobs for tag context\n      when: never\n    - if: $CI_COMMIT_BRANCH == \"master\"     # Do not run on master, since will run on the merge request just prior\n      when: never\n    - if: $CI_MERGE_REQUEST_IID             # CI_MERGE_REQUEST_IID exists, so run job\n  stage: stage1\n  tags:\n    - ecp-theta\n    - shell\n  script:\n    - echo \"Run test 1\"\n

              "},{"location":"services/gitlab-ci/#template-jobs","title":"Template Jobs","text":"

              GitLab allows you to create template jobs, these are pieces of job specifications which can be included in jobs. Each template job name must begin with a period (.) and follow the same syntax as normal jobs. To instantiate a job based on the template job use the keyword extends. If your specific job declares a key/value already in the template, the specific job will overwrite it.

              Example: Use a job template so two tests will only run on merge requests

              .MR_rules:\n  rules:\n    - if: $CI_COMMIT_TAG                    # Do not execute jobs for tag context\n      when: never\n    - if: $CI_COMMIT_BRANCH == \"master\"     # Do not run on master, otherwise runs everything from scratch on merge\n      when: never\n    - if: $CI_MERGE_REQUEST_IID\n    - if: '$CI_PIPELINE_SOURCE == \"merge_request_event\"'    \n\ntest1:\n  extends: .MR_rules\n  stage: stage1\n  tags:\n    - ecp-theta\n    - shell\n  script:\n    - echo \"Run test 1\"\ntest2:\n  extends: .MR_rules\n  stage: stage2\n  tags:\n    - ecp-theta\n    - shell\n  script:\n    - echo \"Run test 2\"\n

              "},{"location":"services/gitlab-ci/#console-output","title":"Console Output","text":"

              To see the output of a job click on it in the GUI and it will show the STDOUT and STDERR from the job run. If the job did not launch successfully it will have error messages from gitlab-runner or Jacamar-CI or both. Please be aware of any sensitive data you do not want exported or saved to the output console, such as passwords. Please do not output large amounts of data from your jobs to the stdout. If your CI/CD job outputs large amounts of text to STDOUT or STDERR, consider redirecting it into a job log.

              GitLab Group Job Console"},{"location":"services/gitlab-ci/#storage-use-and-policy","title":"Storage Use and Policy","text":""},{"location":"services/gitlab-ci/#gitlab-project-quota","title":"GitLab Project Quota","text":"

              Each repository has a default quota of 1GB. Quota increases may be requested by emailing Support. This quota is separate from the storage quotas allocated to ALCF Projects and ALCF Users on the HPC clusters and shared filesystems.

              "},{"location":"services/gitlab-ci/#cicd-filesystem-usage","title":"CI/CD Filesystem usage","text":"

              CI/CD jobs will run out of your home directory by default. Each job will begin by cloning the repository into a path under ~/.jacamar-ci and will continue to write there unless you reference other destinations in your CI/CD job. You will need to ensure that you have the minimum amount of space for this runner operation. If you do not, the job will fail to run. Each gitlab runner will create a new sub directory under ~/.jacamar-ci for itself, however it will reuse space for subsequent pipelines launched for that project on that runner.

              It is recommended that if you need more space then your home directory can provide, that you leverage any ALCF Project space you may have been allocated on a shared filesystem.

              "},{"location":"services/gitlab-ci/#gitlab-ci-access-termination-policy","title":"GitLab-CI Access Termination Policy","text":"

              Projects that have been inactive for at least 6 months will have their access disabled and their repositories deleted. Notification will be sent to the PI 30 days prior to the day of the action.

              Inactivity is defined as, but not limited to:

              • No new projects created
              • No new commits to an existing project
              • Prolonged period of continuously failing CI/CD jobs (In the case of re-occurring scheduled jobs)
              "},{"location":"services/jenkins/","title":"Jenkins on Theta","text":""},{"location":"services/jenkins/#jenkins-to-be-decommissioned","title":"Jenkins to be decommissioned","text":"

              New projects should request access to use our GitLab-CI-based service. You can learn how to request access in our documentation found here.

              Existing projects can continue to use Jenkins. We will notify projects when we have the date it will be retired. Projects will have ample notice to migrate their work to our GitLab-CI service.

              "},{"location":"services/jenkins/#jenkins-at-alcf","title":"Jenkins at ALCF","text":"

              The ALCF provides a tool for implementing CI processes named Jenkins. Using the Jenkins tool, ALCF projects can make use of CI functionality. The Jenkins CI tool enables projects to auto-compile their custom software code, automate testing cycles, provide a feedback loop, and submit jobs to HPC resources. The custom pipelines needed for each project can be defined in Jenkins by project users, and execution can be controlled through triggers.

              Additional information, technical and user documentation, and community support can be found on the Jenkin's project website.

              "},{"location":"services/jenkins/#projects-using-jenkins","title":"Projects Using Jenkins","text":"

              Enabling a project to use Jenkins requires some additional steps and configuration to get started. Once enabled for a project, users can access the Jenkins CI environment and configure jobs or pipelines for building and testing their project code.

              "},{"location":"services/jenkins/#on-boarding-with-jenkins","title":"On-Boarding with Jenkins","text":"

              To enable Jenkins for your project, send an email to support@alcf.anl.gov requesting Jenkins access for your project and include the ALCF project shortname and the PI\u2019s name with the request.

              The project\u2019s PI will get an email with details and a new Jenkins account associated with the project. This is a service account that the Jenkins CI tool will use when executing tasks associated with your project. The CI account will be listed as a project member and added to the project\u2019s group for access controls.

              "},{"location":"services/jenkins/#alcf-jenkins","title":"ALCF Jenkins","text":"

              Log in to the ALCF Jenkins web portal using your ALCF credentials (ALCF username and cryptocard token password).

              "},{"location":"services/jenkins/#folders","title":"Folders","text":"

              Each Jenkins project will have a top-level \"folder\" created with the project\u2019s name. Please do not delete the project folder: it is used for organization in the multi-project environment and is required for implementing the needed level of security. The project folder is where all of the project objects are stored, you can additionally create any subfolders, jobs, pipelines, etc. within your project folder to meet your CI needs.

              In the example below, we have a project named \"TestFromJanet2\" with an associated folder.

              CI folders screenshot"},{"location":"services/jenkins/#nodes","title":"Nodes","text":"

              Each Jenkins project will have an assigned node for execution. Nodes execute jobs defined within a project, typically on the target system\u2019s login node. Currently there are Jenkins nodes configured for HPC systems Theta and Cooley, as well as non-HPC nodes with 32 cores (Intel Xeon Processor E5-2683 v4) and 128 GB RAM for generic x86 processing with access to the Mira shared filesystems.

              In the example below, the node for this project is named \u2018TestFromJanet2-Theta. Jobs and pipeline steps triggered from Jenkins will execute on the TestFromJanet2-Theta node which has been configured to use host: thetalogin1 and will use the project\u2019s Jenkin's user ID (provided during on-boarding) to execute scripts or code just as if the end user had logged into the thetalogin1 node and executed the same set of actions manually from the command line.

              CI slaves screenshot"},{"location":"services/jenkins/#job-configuration","title":"Job Configuration","text":"

              When configuring any new job within a project there are some guidelines to follow for setting permissions and nodes. Project data is kept secure by setting up permissions at the project level and node selection controls where the job will execute.

              When creating jobs, enable project-based security, set the inheritance strategy, and add your project\u2019s group name to the permission matrix table. The example below has enabled project-based security, set the inheritance strategy to Do not inherit permission grants from other ACLs, and added the project\u2019s group name \"TestFromJanet2\" to the permission matrix granting all rights to the group.

              CI permissions screenshot

              To assign the node that the project will use to execute jobs, select the option Restrict where this project can be run and enter the project\u2019s assigned node. The example below has assigned the jobs to node: TestFromJanet2-Theta so that any time the job is executed, it runs on host: thetalogin1.

              Execute screenshot"},{"location":"services/jenkins/#common-jenkins-features","title":"Common Jenkins Features","text":""},{"location":"services/jenkins/#version-control-features","title":"Version Control Features","text":"

              Jenkins can connect to most common version control systems (VCS), including git/svn. The ALCF Jenkins instance can connect with local VCS hosted at at ANL as well as with external VCS, such as that hosted at Github.

              On the job configuration page, look for the section Source Code Management (SCM). If it is there already, add it to the job. The required fields for SCM are Repository URL and Credentials. The example below shows a connection to the ALCF internal Gitlab VCS and uses previously setup credentials.

              Repository access

              To use the new connection to the Git repository interactively, configure the job to be parameterized and add a Git Parameter to the job. The example below shows the configuration to select a branch at build time.

              Git parameter

              On the build screen, select from the drop-down menu the branch to be referenced during this job execution. The example below shows the list of available branches from the configured repository. It is automatically populated during the Git connector configuration of the preceding steps. If a new branch is added to the Git repository, it will display in the populated list of available branches when the job runs in Jenkins.

              Select branch"},{"location":"services/jenkins/#build-steps","title":"Build Steps","text":"

              Build steps are where users define executable tasks and jobs do something interesting within an environment. A core component of Jenkins, build steps can take a few different forms and are most commonly configured to call remote scripts for code building and deployment. A build step can even contain the shell script contents to execute on the remote machine.

              Add build step

              The example below uses the Execute Shell build step type and codes the shell logic within the Jenkins portal.

              Execute shell"},{"location":"services/jenkins/#pipelines","title":"Pipelines","text":"

              Pipelines in Jenkins allow for more advanced execution logic and are written in Groovy. A pipeline can be added directly to your project as an object using the New Item link. More commonly, they are defined in a \"Jenkinsfile\" and stored in VCS along with the project code. The Jenkinsfile can be created and edited outside of the Jenkins system using any text editor.

              To add a pipeline manually, select Pipeline from the new New Item dialog box.

              New Item dialog box

              The pipeline can then be configured and edited from the project folder in the same way as jobs, as shown in the example below.

              Pipeline configuration

              To add a pipeline using a Jenkinsfile in SCM, add the pipeline object as shown below. On the pipeline configuration page, select Pipeline script from SCM and provide the SCM connection details along with the Script Path. The Script Path is the path-to and filename where the Jenkinsfile is located within the SCM repository. The example below uses a Jenkinsfile stored in the project source code from the ALCF Git repository, and the Jenkinsfile containing the Groovy code pipeline definition is located at scripts/Jenkinsfile from the repository root.

              Pipeline script path"},{"location":"services/jenkins/#triggers","title":"Triggers","text":"

              Triggers are events that initiate tasks in Jenkins. Triggers can be called a few different ways, including directly by a user via the Build Now action (a time-based trigger similar to a Cron system), or based on commits made to source control.

              The example below shows a time-based configuration to run the job on a regular schedule. Details on the scheduling syntax can be found by clicking the blue question mark to the right of the Schedule field.

              Build triggers"},{"location":"services/jenkins/#console-output","title":"Console Output","text":"

              Jenkins provides console output and saves this history for each job run. During job execution you can view the live output from the tasks in a display similar to what would be seen if the commands were run directly in an interactive console.

              Console output"},{"location":"services/jenkins/#credentials","title":"Credentials","text":"

              Credentials are stored in Jenkins and used when connecting to remote resources that require authentication in a non-interactive manner. Once defined, credentials can be used throughout the Jenkins system when configuring jobs, SCM connections, SSH connections, etc.

              To add a set of credentials, click on Credentials from the available options on the left-hand navigation menu. Then select System and click on the link for Global credentials.

              Credentials

              Click Add Credentials from the left-hand navigation menu and provide the required information. The example below configures a new credential set of type \"SSH Username with private key.\" Make sure Scope is set to \"Global.\" Provide the username, private key (copy and paste), and key passphrase, and then give a pertinent ID and detailed description to help identify and organize stored credentials in the system.

              Add credentials"},{"location":"services/jenkins/#faqs","title":"FAQS","text":"

              Why does my project's execution node say it is offline? Node services for executing project tasks are initiated when there is demand for the node. The process of starting the node services can take up to one minute; the status change is displayed in the Jenkins web portal. When there is no longer demand for the node, the services will stop again after one minute of idle time.

              Why is my shell environment different when executing tasks on a Jenkins node? Since Jenkins uses SSH with no tty, any shell scripts need to have this at the top so that login scripts are run against the session:

              #!/bin/bash -1

              "},{"location":"services/jenkins/#glossary","title":"Glossary","text":"

              Continuous Integration (CI) - The process of automating the build and testing of code every time developers commit changes to version control.

              Pipeline - A CI pipeline is a list of tasks or jobs that are defined and executed as a procedure within a project. Pipeline is analogous to workflow.

              Source Control Management (SCM) - A term used in Jenkins to describe objects related to version control.

              Version Control System (VCS) - Software that manages access, storage, and revision history for a code repository.

              "},{"location":"services/jenkins/#appendix","title":"Appendix","text":""},{"location":"services/jenkins/#abbreviated-setup","title":"Abbreviated Setup","text":"
              • Request Jenkins access for your project by emailing the ALCF Service Desk.
              • Add jobs and pipelines to the project folder space to handle code compiling and testing.
              • Configure jobs with credentials, SCM integrations, and trigger components depending on the intended behavior for your project.
              • Execute jobs and pipelines by invoking the configured triggers.
              "},{"location":"services/jupyter-hub/","title":"JupyterHub","text":"

              JupyterHub is an open-source service application that enables users to launch separate Jupyter instances on a remote server. ALCF JupyterHub provides access to Polaris, ThetaGPU, Theta, and Cooley with the same authentication protocol that is used to access these systems, but through a web interface rather than a terminal. On the ALCF JupyterHub home page, users can choose their desired system. Upon selection, they'll be directed to the sign-in page to enter their ALCF username and passcode token.

              ALCF JupyterHub home page and sign-in screen

              We describe below how to use JupyterHub on Polaris, ThetaGPU, Theta, and Cooley in more detail.

              "},{"location":"services/jupyter-hub/#polaris","title":"Polaris","text":"

              The Polaris JupyterHub server runs on a Polaris login node and launches individual users' environments on the compute nodes through the PBS job scheduler. After the authentication step, the user will be presented with the menu of the available job options to start the Jupyter instance.

              • Select a job profile: This field lists the available profiles, which is limited to \u201cPolaris Compute Node\u201d at this time.
              • Queue Name: This field provide a list of available queues on the system.
              • Project List: This field displays the active projects associated with the user on Polaris.
              • Number of Nodes: This field allows the user to select the number of compute nodes to be allocated.
              • Runtime (minutes:seconds): This field allows the user to set the runtime of the job in minutes and seconds. The user should refer to the Polaris queue scheduling policy for minimum and maximum runtime allowed for the selected queue.
              • File Systems: This field allows the user to select the file systems to be mounted. By default all the file systems are selected.

              Polaris Job options

              Once the appropriate information is provided the user will click the \u201cStart\u201d button and wait for the job to spawn. If there's an extended wait time due to a lengthy job queue, the interface might time out, leading to the job's removal from the queue. If not, the job kicks off and it begins to use up the user's allocation based on the chosen job options. It's crucial for users to shut down the server when resources are no longer required. Failing to do so will result in continued consumption of the allocated time until the predetermined runtime concludes.

              "},{"location":"services/jupyter-hub/#thetagpu","title":"ThetaGPU","text":"

              The ThetaGPU JupyterHub instance can run either on an external server or directly on ThetaGPU compute nodes. After the authentication step, the user will be presented with a drop-down menu to \"Select a job profile\", with the options \u201cLocal Host Process\u201d and \u201cThetaGPU Compute Node\u201d as shown below.

              Select a job profile

              \"Local Host Process\u201d will start the Jupyter Notebook on the JupyterHub server (external to the compute resource).

              \"ThetaGPU Compute Node\" will allow a user to start a Jupyter Notebook instance on an available compute node by requesting a node via the job scheduler, Cobalt. When a user selects this option additional options will appear as shown below.

              • ThetaGPU Queue (MinTime/MaxTime): This field provide a list of available queues on the system with the minimum and maximum times allowed for each queue.
              • Project List: This field displays the active projects associated with the user on the given system (ThetaGPU).
              • Runtime (minutes): This field allows the user to set the runtime of the job in minutes. Please note that minimum and maximum times are shown on the menu. You may refer to the ThetaGPU queue scheduling policy for more details.

              ThetaGPU Job options

              Once the appropriate information is provided the user will click the \u201cStart\u201d button and wait for the job to spawn. In cases where the job queue is long the interface will time out and the job will be removed from the queue.

              Job queued

              NOTE: If you would like to change your selection about where to run the Jupyter instance after the Notebook started, you need to stop the server to be able to see the drop-down menu again.

              "},{"location":"services/jupyter-hub/#theta-and-cooley","title":"Theta and Cooley","text":"

              JupyterHub for Cooley and Theta deploy notebooks on an external server that mounts the same home directory (swift-home) as used in these systems. However, users cannot directly access the compute nodes for these systems. We would like to note that both Cooley and Theta will be retired by the end of 2023, therefore we recommend our users to switch to Polaris and ThetaGPU instead.

              "},{"location":"services/jupyter-hub/#additional-notes","title":"Additional Notes","text":""},{"location":"services/jupyter-hub/#custom-ipython-kernels","title":"Custom IPython Kernels","text":"

              ALCF JupyterHub provides a set of pre-configured IPython kernels for the users to select. However, users may need custom kernels with additional packages installed. This can be achieved by first creating custom Python environments either through venv or conda. More information on creating custom Python environments can be found in our documentation for Polaris and ThetaGPU. After activating the custom environment, ipykernel package needs to be installed with the following command:

              pip install ipykernel\n
              Once ipykernel is installed, the custom kernel can be added to the list of available kernels with the following command:
              python -m ipykernel install --user --name custom_kernel_name \n
              where custom_kernel_name is the name of the kernel that will appear in the kernel list. This name does not have to match the name of the environment, but should not contain spaces. If you want more flexibility in naming, you can add the --display-name argument as shown below.
              python -m ipykernel install --user --name custom_kernel_name --display-name \"Polaris Python 3.11 Tensorflow 2.4.1\" \n
              Note that, you still need to provide --name with a simple name that does not contain spaces. Additionally, you can also set environment variables for the kernel with the --env argument, i.e:
              python -m ipykernel install --user --name custom_kernel_name --env http_proxy http://proxy.alcf.anl.gov:3128 --env https_proxy http://proxy.alcf.anl.gov:3128\n
              You can see the list of available kernels with the following command:
              jupyter kernelspec list\n
              By default, the kernels are installed in the user's home directory under ~/.local/share/jupyter/kernels/. All the configuration is specified in the kernel.json file under the kernel directory. For the example above, the path for the json file will be ~/.local/share/jupyter/kernels/custom_kernel_name/kernel.json. You can edit this file to add additional environment variables or change the display name.

              Once you've followed the steps above, your new kernel will be visible on JupyterHub. It's recommended to perform these steps in a terminal, ideally on the login node of the system you're using. After setting up a custom kernel, you can easily add more packages directly within JupyterHub. Simply create a new notebook using your custom kernel and use the %pip or %conda magic commands to install packages. If you're on a compute node, remember to enable internet access by configuring the http_proxy and https_proxy environment variables as previously mentioned.

              "},{"location":"services/jupyter-hub/#accessing-project-folders","title":"Accessing Project Folders","text":"

              Jupyter file browser limits the user to view files and directories within their home directory. To access directories located outside of the user home directory a symbolic link to the directory must be created within the user home directory. An example of this is:

              ln -s /project/ABC ~/ABC_project_link\n
              Please note that one can run any shell command directly on a Jupyter notebook by simply adding an exclamation mark, !, to the beginning of the command. For example, the above command can be run from a notebook cell as follows:

              !ln -s /project/ABC ~/ABC_project_link\n
              "},{"location":"services/jupyter-hub/#ending-a-jupyter-notebook-running-on-a-compute-node","title":"Ending a Jupyter Notebook running on a compute node","text":"

              Failing to correctly end a running Jupyter Notebook will continue to consume the selected project's allocation on the resource in question. When a user has completed their task in Jupyter the user should stop the Jupyter instance running on the compute node before logging out. To stop the Notebook, click the \u201cControl Panel\u201d button in the top right, then click \u201cStop My Server\u201d.

              Stop panel

              Stop server"},{"location":"services/jupyter-hub/#resources","title":"Resources","text":"
              • Jupyter Lab documentation.
              • ALCF Hands-on HPC Workshop presentation on Python and Jupyter on Polaris: slides and video.
              • ALCF webinar on JupyterHub: slides and video.
              "},{"location":"theta/theta-decommissioning/","title":"Theta and Theta-fs0 Decommissioning","text":"

              Theta and Theta-fs0 will be retired at the end of calendar year 2023. ThetaGPU will go offline for a short time, but then will come back online as an independent system running the PBS scheduler to be in line with the current standard for the ALCF. Here are additional details:

              • Jobs on Theta and ThetaGPU will stop running at 23:59:59 on 12/31/2023.
              • There is currently no ETA for the ThetaGPU return to service, but we will provide a timeline in Q4CY23.
              • Information on how to get allocations on the new ThetaGPU will be provided in Q4CY23.
              "},{"location":"theta/theta-decommissioning/#for-theta-fs0","title":"For Theta-fs0:","text":"

              Step 1: Update your scripts/workflows to switch all uses of theta-fs0 to the eagle filesystem as soon as possible. Data allocations on eagle will be provided.

              NOTE: theta-fs0 will be mounted read-only on 10/30/2023 and will no longer be available starting 01/01/2024. Any jobs attempting to write to theta-fs0 on 10/30/2023 or later will fail.

              Step 2: Migrate existing data off theta-fs0 as needed:

              • If there is data you do not need on theta-fs0, just leave it in place. No need to move it or delete it.
              • To move your data off of theta-fs0 use Globus Online. Please see using Globus for instructions on how to do so. This will work inside ALCF as we have Globus endpoints on all the relevant file systems. If you need to transfer data out of ALCF and there isn\u2019t a Globus endpoint there are directions on the page linked above about how to use Globus Connect Personal to install an endpoint for yourself.
              "},{"location":"theta/applications-and-libraries/applications/elpa/","title":"ELPA","text":""},{"location":"theta/applications-and-libraries/applications/elpa/#what-is-elpa","title":"What is ELPA?","text":"

              ELPA is a Fortran/C/MPI library to solve dense hermitian (real or complex) matrices. ELPA is conceived to compute the eigenvectors and eigenvalues of large matrices in petascale computers. ELPA uses BLACS framework and some SCALAPACK functions to distribute and solve the eigen problem. Computationally intensive kernels in ELPA are optimized using intrinsic code for SSE, AVX, AVX2, AVX512, and QPX architectures. This code is popular in electronic structure codes, but it can be used for machine learning and other approaches that require full or partial spectrum solution of a matrix eigen problem. ELPA scales efficiently in Theta solving matrices of 1 million by 1 million in less than 3 hours in 3000 KNL nodes.

              "},{"location":"theta/applications-and-libraries/applications/elpa/#using-elpa-at-alcf","title":"Using ELPA at ALCF","text":"

              ALCF provides assistance with compiling the library. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"theta/applications-and-libraries/applications/elpa/#how-to-obtain-the-code","title":"How to obtain the code","text":"

              ELPA can be downloaded free of charge from https://elpa.mpcdf.mpg.de/software.

              "},{"location":"theta/applications-and-libraries/applications/elpa/#building-elpa-on-theta","title":"Building ELPA on Theta","text":"

              ELPA must be compiled with AVX512 support in Theta. ELPA has OpenMP support, but it has shown lower performance for large number of MPI ranks. Because the interface of ELPA subroutines may change among versions 2016, 2017 and 2018, we strongly suggest that users visit https://elpa.mpcdf.mpg.de/software for further information.

              This is an example for the compilation of ELPA in Theta.

              > cat build_elpa_theta.sh\ngit clone https://gitlab.mpcdf.mpg.de/elpa/elpa.git\ncd elpa\naclocal\nautoreconf\n./configure --prefix=/soft/applications/elpa/elpa2017 \\\n     --host=x86_64-suse-linux-gnu \\\n     --disable-shared --enable-avx512 \\\n     FC=ftn CC=cc \\\n     SCALAPACK_LDFLAGS=\"\" \\\n     SCALAPACK_FCFLAGS=\"\" \\\n     FCFLAGS=\u201d-I/opt/intel/compilers_and_libraries_2018.0.128/linux/mkl/include/intel64/lp64\u201d\nmake && make install\n
              "},{"location":"theta/applications-and-libraries/applications/gromacs/","title":"Gromacs on Theta","text":""},{"location":"theta/applications-and-libraries/applications/gromacs/#what-is-gromacs","title":"What is Gromacs?","text":"

              GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids, and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

              "},{"location":"theta/applications-and-libraries/applications/gromacs/#using-gromacs-at-alcf","title":"Using GROMACS at ALCF","text":"

              ALCF offers assistance with building binaries and compiling instructions for GROMACS. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"theta/applications-and-libraries/applications/gromacs/#building-gromacs","title":"Building Gromacs","text":"
              1. Download latest source code: http://manual.gromacs.org/documentation/2022.1/download.html
              2. tar -xzf gromacs-2022.1.tar.gz
              3. cd gromacs-2022.1
              4. mkdir build
              5. module load cmake
              6. module swap PrgEnv-intel PrgEnv-gnu/6.0.10
              7. cmake -DCMAKE_C_COMPILER=cc -DCMAKE_CXX_COMPILER=CC \\\n      -DBUILD_SHARED_LIBS=OFF -DGMX_BUILD_OWN_FFTW=ON \\\n      -DCMAKE_INSTALL_PREFIX=/path-to/gromacs-2022.1/build \\\n      -DGMX_MPI=ON -DGMX_OPENMP=ON -DGMX_CYCLE_SUBCOUNTERS=ON -DGMX_GPU=OFF \\\n      -DGMX_BUILD_HELP=OFF -DGMX_HWLOC=OFF -DGMX_SIMD=AVX_512_KNL \\\n      -DGMX_OPENMP_MAX_THREADS=256\n
              8. make \u2013j 16
              9. make install
              10. The installed binary is build/bin/gmx_mpi.
              "},{"location":"theta/applications-and-libraries/applications/gromacs/#running-gromacs-on-theta","title":"Running Gromacs on Theta","text":"

              Prebuilt Gromacs binaries can be found in the directory /soft/applications/gromacs/gromacs_theta.

              A sample qsub script follows.

              #!/bin/bash\n#COBALT -n 1\n#COBALT -t 30 \n#COBALT -q debug-cache-quad \n#COBALT -project catalyst \n#COBALT --attrs mcdram=cache:numa=quad\n#COBALT --attrs filesystems=home,theta-fs0\n\nexport GMX_MAXBACKUP=-1 \n\naprun -n64 -N64 --env OMP_NUM_THREADS=2 --cc depth -d 2 -j 2 \\\n      /soft/applications/gromacs/gromacs_theta/gmx_mpi.2022.1 mdrun \\\n      -dlb yes -resethway -pin on -v deffnm step5_1 -g test.log\n

              We strongly suggest that users try combinations of different numbers of nodes, MPI ranks per node, and OMP threads per rank to find the optimal throughput for their particular workload.

              The following is a representative benchmark for a system with 30,000 atoms generated on a single Theta node with above aprun command.

              Core time(sec) Wall time(sec) (%) Time 9016.068 70.441 12799.4 ns/day hour/ns Performance 61.330 0.391"},{"location":"theta/applications-and-libraries/applications/lammps/","title":"LAMMPS on Theta","text":""},{"location":"theta/applications-and-libraries/applications/lammps/#overview","title":"Overview","text":"

              LAMMPS is a general-purpose molecular dynamics software package for massively parallel computers. It is written in an exceptionally clean style that makes it one of the most popular codes for users to extend and it currently has dozens of user-developed extensions.

              For details about the code and its usage, see the LAMMPS home page. This page is dedicated to information pertaining to Theta/ThetaGPU at the ALCF.

              "},{"location":"theta/applications-and-libraries/applications/lammps/#using-lammps-at-alcf","title":"Using LAMMPS at ALCF","text":"

              ALCF provides assistance with build instructions, compiling executables, submitting jobs, and providing prebuilt binaries. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"theta/applications-and-libraries/applications/lammps/#how-to-obtain-the-code","title":"How to Obtain the Code","text":"

              LAMMPS is an open-source code, which can be downloaded at http://lammps.sandia.gov/download.html.

              "},{"location":"theta/applications-and-libraries/applications/lammps/#building-on-theta","title":"Building on Theta","text":"

              After LAMMPS has been downloaded and unpacked, you should see a directory whose name is of the form lammps-. One should see lammps-/src/MAKE/MACHINES/Makefile.theta in recent versions that can be used for compilation on Theta. The top portion of that Makefile is provided below with suggested compiler settings. For older versions of LAMMPS, you will need to take an existing Makefile (e.g. Makefile.mpi) for the specific version used and edit the top portion appropriately to create a Makefile.theta.

              # theta = Flags for Knights Landing Xeon Phi Processor, Intel compiler, Cray MPI, MKL FFT\n# module unload libsci\n# make theta -j 8\n\nSHELL = /bin/sh\n\n# ---------------------------------------------------------------------\n# compiler/linker settings\n# specify flags and libraries needed for your compiler\n\nKOKKOS_DEVICES = OpenMP\nKOKKOS_ARCH = KNL\n\nCC       = CC -mkl\nOPTFLAGS = -xMIC-AVX512 -O3 -fp-model fast=2 -no-prec-div -qoverride-limits\nCCFLAGS  = -g -qopenmp -qno-offload -ansi-alias -restrict \nCCFLAGS += -DLMP_INTEL_USELRT -DLMP_USE_MKL_RNG $(OPTFLAGS)\nCCFLAGS += -std=c++11\nCCFLAGS += -DLAMMPS_MEMALIGN=64\nSHFLAGS  = -fPIC\nDEPFLAGS = -M\n\nLINK      = $(CC)\nLINKFLAGS = -g -qopenmp $(OPTFLAGS) -dynamic\n#LIB       = -ltbbmalloc\nLIB       = -L$(TBBROOT)/lib/intel64/gcc4.8 -ltbbmalloc -Wl,-rpath=$(TBBROOT)/lib/intel64/gcc4.8\nSIZE      = size\n\nARCHIVE    = ar\nARFLAGS    = -rc\nSHLIBFLAGS = -shared\n\n# ---------------------------------------------------------------------\n# LAMMPS-specific settings, all OPTIONAL\n# specify settings for LAMMPS features you will use\n# if you change any -D setting, do full re-compile after \"make clean\"\n\n# LAMMPS ifdef settings\n# see possible settings in Section 2.2 (step 4) of manual\n\nLMP_INC =\n\n# MPI library\n# see discussion in Section 2.2 (step 5) of manual\n# MPI wrapper compiler/linker can provide this info\n# can point to dummy MPI library in src/STUBS as in Makefile.serial\n# use -D MPICH and OMPI settings in INC to avoid C++ lib conflicts\n# INC = path for mpi.h, MPI compiler settings\n# PATH = path for MPI library\n# LIB = name of MPI library\n\nMPI_INC  = -DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX=1\nMPI_PATH =\nMPI_LIB  =\n\n# FFT library\n# see discussion in Section 2.2 (step 6) of manaul\n# can be left blank to use provided KISS FFT library\n# INC = -DFFT setting, e.g. -DFFT_FFTW, FFT compiler settings\n# PATH = path for FFT library\n# LIB = name of FFT library\n\nFFT_INC  = -DFFT_MKL -DFFT_SINGLE\nFFT_PATH =\nFFT_LIB  = -L$(MKLROOT)/lib/intel64 -Wl,--start-group -lmkl_intel_lp64 \\\n           -lmkl_core -lmkl_intel_thread -Wl,--end-group\n\n...\n
              As newer versions of LAMMPS are distributed and changes made to the Makefile, the example Makefile above can be used to generate an updated Makefile using one of the Intel examples packaged with LAMMPS. With the Makefile in place, LAMMPS can be compiled from the lammps-/src directory using the following command.
              cd lammps-<version>/src\nmake theta -j 8\n
              "},{"location":"theta/applications-and-libraries/applications/lammps/#running-lammps-jobs-on-theta","title":"Running LAMMPS Jobs on Theta","text":"

              Following is an example executable script \u201crun_lammps.csh\u201d to run LAMMPS on two nodes of Theta with 64 MPI ranks per node. The job can be submitted with command \u201cqsub run_lammps.csh\u201d, where is replaced with an active project allocation.

              #!/bin/csh\n#COBALT -n 2 -t 10 -q debug-cache-quad -A <project_name> -O LAMMPS\n\naprun -n 128 -N 64 -d 1 --cc depth -e OMP_NUM_THREADS=1 -j 1 ./lmp_theta -in lmp.in\n
              "},{"location":"theta/applications-and-libraries/applications/lammps/#performance-notes","title":"Performance Notes","text":"

              When possible, users will want to build LAMMPS executables with the USER-OMP and USER-INTEL packages for best performance on Theta. Following is an example script \u201crun_lammps_intel.csh\u201d to run LAMMPS on two nodes of Theta with 64 MPI ranks per node and two OpenMP threads per rank with the USER-INTEL and USER-OMP packages. The job can be submitted with command \u201cqsub run_lammps_intel.csh.\u201d

              #!/bin/csh\n#COBALT -n 2 -t 10 -q debug-cache-quad -A <project_name> -O LAMMPS\n\naprun -n 128 -N 64 -d 2 --cc depth -e OMP_NUM_THREADS=2 -j 2 ./lmp_theta -in lmp.in -sf hybrid intel omp\n
              Not all available forcefields in LAMMPS are supported in one or both of these packages. For the latest information, please check LAMMPS website and documentation.

              "},{"location":"theta/applications-and-libraries/applications/lammps/#building-on-thetagpu","title":"Building on ThetaGPU","text":"

              There are two key packages available in LAMMPS for running on the GPUs available in ThetaGPU: GPU and KOKKOS. Example Makefiles based on recent version of LAMMPS are available for download from ALCF GitHub.

              LAMMPS can be built on the ThetaGPU compute nodes with the default software environment and support for the GPU package using the following commands once the Makefiles at the above link are placed appropriately based on instructions in the README.

              cp Makefile.gpu_thetagpu lammps-<version>/lib/gpu\ncp Makefile.thetagpu lammps-<version>/src/MAKE/MACHINES\n\ncd lammps-<version>/lib/gpu\nmake -f Makefile.gpu_thetagpu -j 8\ncd ../../src\nmake yes-GPU\nmake thetagpu -j 8\n

              LAMMPS can be built with the default software environment and support for the KOKKOS package using the following commands and the Makefile at the above link.

              cp Makefile.thetagpu_kokkos lammps-<version>/src/MAKE/MACHINES\n\ncd lammps-<version>/src\nmake yes-KOKKOS\nmake thetagpu_kokkos -j 8\n
              "},{"location":"theta/applications-and-libraries/applications/lammps/#running-lammps-jobs-on-thetagpu","title":"Running LAMMPS jobs on ThetaGPU","text":"

              Following is an example executable script submit_full-node.sh to run LAMMPS a ThetaGPU node using all GPUs for both the GPU and KOKKOS packages. This example is based on the Rhodoposin benchmark using lammps-/bench/in.rhodo.

              After the appropriate command is uncommented, the job can be submitted with \u201cqsub ./submit_full-node.sh\u201d, where \"-A Catalyst\" in the script is replaced with an appropriate active project allocation.

              Note: The preceding 'qsub' command should be executed 1) on the ThetaGPU login nodes or 2) from the Theta login node after executing 'module load cobalt/cobalt-gpu'.

              #!/bin/sh\n#COBALT -n 1 -t 15 -q full-node -A Catalyst\n\n# submit job to run on 8 GPUs w/ 8 MPI ranks per GPU and 2 OpenMP threads per rank\n#env OMP_NUM_THREADS=2 mpirun -np 64 ~/bin/lammps/lammps-git/src/lmp_thetagpu -in in.rhodo -pk gpu 8 -pk omp 0 -sf hybrid gpu omp\n\n# KOKKOS package: submit job to run on 1 GPU w/ 1 MPI ranks per GPU and 2 OpenMP threads per rank\n\n#!/bin/sh\n#COBALT -n 1 -t 15 -q full-node -A Catalyst\n\n# submit job to run on 8 GPUs w/ 8 MPI ranks per GPU and 2 OpenMP threads per rank\n#env OMP_NUM_THREADS=2 mpirun -np 64 ~/bin/lammps/lammps-git/src/lmp_thetagpu -in in.rhodo -pk gpu 8 -pk omp 0 -sf hybrid gpu omp\n\n# KOKKOS package: submit job to run on 1 GPU w/ 1 MPI ranks per GPU and 2 OpenMP threads per rank\nmpirun -np 8 ~/bin/lammps/lammps-git/src/lmp_thetagpu -in in.rhodo -k on g 8 -sf kk -pk kokkos neigh half\n
              Additional details on the specific usage of the GPU and KOKKOS packages and how best to use multiple GPUs per node is available in the LAMMPS documentation.

              "},{"location":"theta/applications-and-libraries/applications/namd/","title":"NAMD","text":""},{"location":"theta/applications-and-libraries/applications/namd/#what-is-namd","title":"What Is NAMD?","text":"

              NAMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 1,000,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR. NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms.

              NAMD has been well optimized by Intel in collaboration with Beckman institute of UIUC. The nonbond kernel is carefully tuned, including minimized random memory access, loop unrolling, compiler directives and AOS vs SOA, to fully leverage Intel\u2019s vectorizing compiler. Generic version charm++ on Aries interconnect is developed to make best use of asynchronous communication feature of charm++ targeting the large number of small-size messages in NAMD instance.

              "},{"location":"theta/applications-and-libraries/applications/namd/#using-namd-at-alcf","title":"Using NAMD at ALCF","text":"

              ALCF provides assistance with build instructions, compiling executables, submitting jobs, and providing prebuilt binaries. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"theta/applications-and-libraries/applications/namd/#building-charm","title":"Building Charm++","text":"
              • git clone https://charm.cs.illinois.edu/
              • cd charm
              • module swap intel intel/16.0.3.210
              • module load gcc/6.3.0
              • module load cray-fftw
              • module load rca
              • module load craype-hugepages8M
              • module unload darshan
              • module list
              • ./build charm++ gni-crayxc-persistent-smp --no-build-shared --with-production -xMIC-AVX512 -j8 Recommended: a user can also build the charm++ tarball file within a NAMD source code downloaded via link http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
              "},{"location":"theta/applications-and-libraries/applications/namd/#building-namd","title":"Building NAMD","text":"

              wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-crayxe-threaded.tar.gz

              tar xzf tcl8.5.9-crayxe-threaded.tar.gz (be sure tcl8.5.9-crayxe-threaded is in $HOME)\nmodule load craype-hugepages8M\nmodule load fftw\ncd namd2\nln -s /path/to/charm \n./config CRAY-XC-KNL-intel --with-fftw3 --charm-arch gni-crayxc-persistent-smp\n(or just \"./config CRAY-XC-KNL-intel\" as fftw3 and gni-crayxc-persistent-smp are defaults)\n\nAdd --with-memopt to the config line to build memory-optimized binary (you can add .suffixto the directory name)\ncd CRAY-XC-KNL-intel\nmake -j8\n\nBe sure that $HOME/tcl8.5.9-crayxe-threaded Tcl library is included and linked!\n\n\"make -j8 release\" will give you a distribution directory and tar file\n
              "},{"location":"theta/applications-and-libraries/applications/namd/#running-namd","title":"Running NAMD","text":"

              A prebuilt NAMD binary can be found in directory /soft/applications/namd/

              The following should be included in the job script:

              module load craype-hugepages8M\nexport HUGETLB_DEFAULT_PAGE_SIZE=8M\nexport HUGETLB_MORECORE=no\nexport ATP_ENABLED=1\n
              There does not seem to be a reason to use other than cache-quad mode. Since all NAMD data will likely fit in MCDRAM using flat-quad with numactl -m 1 is reasonable, but no performance advantage over cache-quad was observed.

              Best performance is generally with two hyperthreads per core, although at the scaling limit one hyperthread per core may be faster.

              Careful attention must be paid to core mapping. The rules are:

              1. Reserve cpu 255 for the OS with -r 1 and then avoid also cpus 63, 127, and 191 that share the core.

              2. Give each communication thread (one per process) a full core to itself.

              3. Do not mix worker threads from different processes on the same core pair (that share L2 cache).

              Given these rules the following options seem reasonable (invoked as, e.g., $APRUN7 /path/to/namd2 $CARGS72 myinput.namd):

              • APRUN1=\"aprun -n $((1*$COBALT_JOBSIZE)) -N 1 -d 125 -j 2 -r 1\"
              • CARGS11=\"+ppn 62 +commap 62 +pemap 0-61 --useCkLoop 6\"
              • CARGS12=\"+ppn 124 +commap 62 +pemap 0-61+64 --useCkLoop 6\"

              or

              • APRUN3=\"aprun -n $((3*$COBALT_JOBSIZE)) -N 3 -d 41 -j 2 -r 1\"
              • CARGS31=\"+ppn 20 +commap 60-62 +pemap 0-59 --useCkLoop 6\"
              • CARGS32=\"+ppn 40 +commap 60-62 +pemap 0-59+64 --useCkLoop 6\"

              or

              • APRUN4=\"aprun -n $((4*$COBALT_JOBSIZE)) -N 4 -d 29 -j 2 -r 1\"
              • CARGS41=\"+ppn 14 +commap 14-62:16 +pemap 0-63:16.14 --useCkLoop 6\"
              • CARGS42=\"+ppn 28 +commap 14-62:16 +pemap 0-63:16.14+64 --useCkLoop 6\"

              or

              • APRUN7=\"aprun -n $((7*$COBALT_JOBSIZE)) -N 7 -d 17 -j 2 -r 1\"
              • CARGS71=\"+ppn 8 +commap 56-62 +pemap 0-55 --useCkLoop 6\"
              • CARGS72=\"+ppn 16 +commap 56-62 +pemap 0-55+64 --useCkLoop 6\"

              The first case, one process per node, is untested and would only make sense for cases with very little communication, such as a multi-copy simulation with one copy per node.

              Three processes per node would also not scale well, and since four processes per node has the same number of worker threads per node as seven processes per node, the final case (APRUN7, CARGS72) generally performs best.

              The \"--useCkLoop 6\" flag is a NAMD option that enables optional PME shared-memory parallelization; it may not make sense for smaller simulations.

              Note: These settings are only for 64-core processors, especially the +64 in the pemap, which assumes hyperthreads on the same core are numbered with a stride of 64.

              "},{"location":"theta/applications-and-libraries/applications/namd/#scaling-and-performance","title":"Scaling and Performance","text":"

              Scaling tests on Theta were run up to 3072 nodes using benchmarks of 210M atoms STMV system (from Phillips et al., SC14).

              Compared per node, Theta is slower than GPU-accelerated Titan, faster than Edison, and a factor of ten faster than Mira. Edison scales better due to faster cores that reduce the impact of serial bottlenecks.

              Compared per core to other CPU-only machines, Theta is slightly faster than Blue Waters (treated as 32 cores).

              Performance on various systems

              Performance on various systems"},{"location":"theta/applications-and-libraries/applications/nwchem/","title":"NWChem","text":"

              NWChem is a parallel quantum chemistry mainly written in Fortran77 and that uses MPI and OpenMP for distributed and multicore computing. NWChem was designed to solve large-scale electronic structure calculations with Hartree-Fock, Density Functional Theory, and other wavefunction correlated methods. See full features at http://www.nwchem-sw.org.

              "},{"location":"theta/applications-and-libraries/applications/nwchem/#using-nwchem-at-alcf","title":"Using NWChem at ALCF","text":"

              ALCF provides binaries and compiling instructions for NWChem. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"theta/applications-and-libraries/applications/nwchem/#how-to-obtain-the-code","title":"How to Obtain the Code","text":"

              NWChem is an open-source code the official web site is http://www.nwchem-sw.org. Change log and release announcements could be found here http://www.nwchem-sw.org/index.php/Download.

              Official NWChem source could be downloaded from the project\u2019s github.com repository https://github.com/nwchemgit/nwchem/releases. For detailed compiling options, please visit the wiki page of NWchem, https://github.com/nwchemgit/nwchem/wiki/Compiling-NWChem. The follow instructions were tested in for the version 6.8.1.

              cat nwchem_envs_681.sh\n\nexport NWCHEM_TOP=/projects/nwchem/nwchem-6.8.1\n\nexport NWCHEM_TARGET=LINUX64\n\nexport USE_MPI=y\n\nexport USE_CPPRESERVE=y\n\nexport NWCHEM_MODULES=\"all\"\n\nexport USE_MPI=y\n\nexport USE_MPIF=y\n\nexport USE_MPIF4=y\n\nexport USE_OPENMP=1\n\nexport USE_KNL=1\n\nexport USE_OPENMP=y\n\nexport  INTEL_64ALIGN=1\n\nexport USE_NOIO=1\n\nexport USE_GAGITHUB=1\n\nexport CRAYPE_LINK_TYPE=dynamic\n\nexport export ARMCI_NETWORK=MPI_TS\n\nexport BLAS_SIZE=4\n\nexport BLASOPT=\"  -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lpthread -lm -ldl\"\n\nexport SCALAPACK=\" -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_lp64 -lpthread -liomp5  -ldl\"\n\nexport BLAS_LIB=$BLASOPT\n\nexport LAPACK_LIB=$BLASOPT\n\nexport SCALAPACK_SIZE=4\n\nexport USE_KNL=1\n\nexport USE_64TO32=1\n\ncd  $NWCHEM_TOP/src\n\nmake nwchem_config\n\nmake 64_to_32\n\nmake\n

              Alternatively, binaries can be found in the folders under /soft/applications/nwchem

              "},{"location":"theta/applications-and-libraries/applications/nwchem/#running-jobs-on-theta","title":"Running Jobs on Theta","text":"

              The following script \u201csubmit.sh\u201d is an example to run Nwchem in Theta a job of 8 nodes with 64 MPI ranks per node. The job can be submitted with command \"qsub submit.sh\"

              cat submit.sh\n\n#!/bin/bash\n\n#COBALT -n 8\n\n#COBALT -t 30\n\n#COBALT -A myproject\n\nmodule load atp\n\nbin=/soft/applications/nwchem/6.8/bin/nwchem\necho \"Running Cobalt Job $COBALT_JOBID.\"\n\nrpn=64\n\ndf -k\n\nexport MPICH_GNI_MAX_EAGER_MSG_SIZE=16384 \n\nexport MPICH_GNI_MAX_VSHORT_MSG_SIZE=10000 \n\nexport MPICH_GNI_MAX_EAGER_MSG_SIZE=131072 \n\nexport MPICH_GNI_NUM_BUFS=300 \n\nexport MPICH_GNI_NDREG_MAXSIZE=16777216 \n\nexport MPICH_GNI_MBOX_PLACEMENT=nic \n\nexport MPICH_GNI_LMT_PATH=disabled \n\nexport COMEX_MAX_NB_OUTSTANDING=6\n\nexport LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64_lin\n\naprun  -n $((COBALT_JOBSIZE*rpn)) -cc depth -d1 -j1  $bin test.nw\n\nqsub submit.sh\n
              "},{"location":"theta/applications-and-libraries/applications/qbox/","title":"Qbox on Theta","text":""},{"location":"theta/applications-and-libraries/applications/qbox/#what-is-qbox","title":"What is Qbox?","text":"

              Qbox is a C++/MPI scalable parallel implementation of first-principles molecular dynamics based on the plane-wave, pseudopotential formalism. As described on the Qbox website http://qboxcode.org/index.htm, Qbox is designed for operation on large parallel computers.

              "},{"location":"theta/applications-and-libraries/applications/qbox/#using-qbox-at-alcf","title":"Using Qbox at ALCF","text":"

              ALCF provides assistance with build instructions, compiling executables, submitting jobs, and providing prebuilt binaries. Some provided executables on Theta are available here: /soft/applications/qbox. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"theta/applications-and-libraries/applications/qbox/#how-to-obtain-the-code","title":"How to Obtain the Code","text":"

              Qbox is an open-source code that can be downloaded at http://qboxcode.org/. Check the Qbox website for current information and latest releases.

              "},{"location":"theta/applications-and-libraries/applications/qbox/#building-on-theta","title":"Building on Theta","text":"

              Qbox requires the standard math libraries plus the Xerces-C library, which can be downloaded at http://xerces.apache.org/xerces-c.

              "},{"location":"theta/applications-and-libraries/applications/qbox/#xerces-c-312","title":"Xerces-C 3.1.2","text":"

              In the xerces directory

              ./configure --host=x86_64-build-linux-gnu --build=x86_64-target-linux-gnu CC=cc CXX=CC CFLAGS=-O2 CXXFLAGS=-O2 --prefix=${HOME}/xerces-c-3 --disable-shared  --disable-pretty-make --disable-threads --enable-transcoder-iconv --disable-netaccessor-curl\nmake\nmake install\n

              The libraries and headers will be installed in the following paths:

              Libraries: ${HOME}/xerces-c-3/lib

              Headers: ${HOME}/xerces-c-3/include

              "},{"location":"theta/applications-and-libraries/applications/qbox/#building-on-theta_1","title":"Building on Theta","text":"

              After Qbox has been downloaded and unpacked, you should see a directory whose name is of the form qbox-. Go to the directory qbox-/build and create a new theta.mk arch file as described below.

              #----------------------------------------------------------------------------\n#\n# theta.mk\n#\n#----------------------------------------------------------------------------\n#\n PLT=x86_64\n#----------------------------------------------------------------------------\n MPIDIR=$(I_MPI_ROOT)/intel64\n XERCESCDIR=$(HOME)/xerces-c-3\n PLTOBJECTS = readTSC.o\n\n CXX=CC -mkl\n LD=CC -mkl\n\n PLTFLAGS += -DIA32 -D_LARGEFILE_SOURCE \\\n             -D_FILE_OFFSET_BITS=64 -DUSE_MPI -DSCALAPACK -DADD_ \\\n             -DAPP_NO_THREADS -DXML_USE_NO_THREADS -DUSE_XERCES\n\n# FFT must be FFTW2, FFTW3, ESSL or NOLIB\n  FFT=FFTW3\n\nifeq ($(FFT),FFTW3)\n PLTFLAGS += -DUSE_FFTW3 -DUSE_FFTW3_THREADS\n PLTFLAGS += -DFFTWMEASURE\n PLTFLAGS += -DFFTW3_2D\n FFTWINCLUDEDIR=$(MKLROOT)/include/fftw\n INCLUDE += -I$(FFTWINCLUDEDIR)\nendif\n\nifeq ($(FFT),NOLIB)\n PLTFLAGS += -DFFT_NOLIB\nendif\n\nINCLUDE += -I$(MPIDIR)/include -I$(XERCESCDIR)/include\n\nCXXFLAGS=  -g -qopenmp -O3 -xMIC-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits -restrict -D$(PLT) \\\n            $(INCLUDE) $(PLTFLAGS) $(DFLAGS)\n\nLIBPATH += -L$(MPIDIR)/lib64 \\\n           -L$(MKLROOT)/lib/intel64 \\\n           -L$(XERCESCDIR)/lib\n\nLIBS += $(PLIBS) \\\n         -Wl,--start-group -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -Wl,--end-group \\\n         -lxerces-c -lpthread -liomp5 -lm -ldl\n\n# Parallel libraries\n PLIBS =\n\n LDFLAGS = -g $(LIBPATH) $(LIBS)\n#----------------------------------------------------------------------------\n

              As newer versions of Qbox are distributed and changes made to the Makefile, the example arch file above can be used to generate an arch file appropriate for the specific version of Qbox. With the arch file in place, Qbox can be compiled from the qbox-/src directory using the following command.

              make TARGET=../build/theta -j 16

              "},{"location":"theta/applications-and-libraries/applications/qbox/#running-qbox-jobs-on-theta","title":"Running Qbox Jobs on Theta","text":"

              The following is an example executable script \u201crun_qbox.csh\u201d to run Qbox on two nodes of Theta with 64 MPI ranks per node. The job can be submitted with command \u201cqsub run_qbox.csh\u201d where is replaced with an active project allocation.

              #!/bin/csh\n#COBALT -n 2 -t 10 -q debug-cache-quad -A <project_name> -O QBOX\n\naprun -n 128 -N 64 -d 1 --cc depth -e OMP_NUM_THREADS=1 -j 1 ./qb test.i\n
              "},{"location":"theta/applications-and-libraries/applications/qbox/#performance-notes","title":"Performance Notes","text":"

              The text below is taken from a discussion on the Qbox user forum regarding NROWMAX.

              The nrowmax variable is used to determine the shape of the rectangular process grid used by Qbox. This process grid is the one used by the Scalapack library. When Qbox starts, the ntasks MPI tasks are assigned to processes arranged in a rectangular array of dimensions nprow * npcol. The default value of nrowmax is 32. The plane-wave basis is divided among nprow blocks, and the electronic states are divided among npcol blocks.

              The following algorithm is used by Qbox to determine the values of nprow and npcol:

              The number of rows nprow is first set to nrowmax. The value of nprow is then decremented until ntasks%nprow==0, i.e. nprow divides the total number of tqasks. The value of npcol is then given by ntasks/nprow. While this looks cryptic, what this algorithm tries to achieve is actually quite simple: define a process grid of dimensions nrowmax*npcol, where npcol=ntasks/nrowmax. This is not always possible, in particular if ntasks%nrowmax != 0. This is why the second part of the algorithm decrements nprow until ntasks%nprow==0.

              Note: The value of nprow is never larger than nrowmax (hence the name).

              This algorithm is implemented in Wavefunction::create_contexts() in file Wavefunction.C

              Examples:

              ntasks=128, nrowmax=32 (default) => process grid 32 x 4\nntasks=48, nrowmax=32 (default) => process grid 24 x 2\nntasks=256, nrowmax=64 => process grid 64 x 4\n
              The shape of the process grid affects performance. In general, it is advantageous to have nprow as large as possible, but not larger than the size of the (fine) FFT grid in the z direction. For example, if the fine FFT grid (printed as np0v,np1v,np2v on output) is 110 x 110 x 110, the value of nrowmax should be 110. Note that other values of nrowmax also work, but performance is usually inferior. For example, one could use nrowmax=128 even if the grid is 110 x 110 x 110, but some of the processes will not be used optimally during FFTs.

              Choosing the value of nrowmax is usually a trial-and-error process. Before running long simulations, it is advisable to run a few test jobs with different values of nrowmax and choose the value that gives best performance.

              "},{"location":"theta/applications-and-libraries/applications/qmcpack/","title":"QMCPACK on Theta","text":""},{"location":"theta/applications-and-libraries/applications/qmcpack/#overview","title":"Overview","text":"

              QMCPACK is a modern, high-performance, open-source Quantum Monte Carlo (QMC) simulation code. Its main applications are electronic structure calculations of molecular, quasi-2D, and solid-state systems.

              "},{"location":"theta/applications-and-libraries/applications/qmcpack/#how-to-access-qmcpack","title":"How to access QMCPACK","text":"

              Prebuilt QMCPACK binaries are provided on Theta under /soft/applications/qmcpack in folder latest-release and current-develop. latest-release uses the latest release of QMCPACK and is recommended for production runs. current-develop uses the development branch which contains new features and bug fixes before the next release. While latest-release and current-develop are updated, all the old binaries are still accessible to users under the folder of year (2017,2018,2019). All the binaries are dynamically linked and require certain versions of libraries to function properly. Please read the README file and load necessary modules. QMCPACK is heavily optimized for Xeon Phi processors by using the Structure-of-Array data layout and new algorithms. Please use the SoA binary whenever possible.

              "},{"location":"theta/applications-and-libraries/applications/qmcpack/#using-qmcpack-at-alcf","title":"Using QMCPACK at ALCF","text":"

              ALCF provides assistance with build instructions, compiling executables, submitting jobs, and providing prebuilt binaries. For questions, contact us at support@alcf.anl.gov.

              If building your own QMCPACK is necessary, follow the instruction described below.

              "},{"location":"theta/applications-and-libraries/applications/qmcpack/#building-on-theta","title":"Building on Theta","text":"

              This recipe was verified on February 20, 2019, with QMCPACK v3.6.0.

              export CRAYPE_LINK_TYPE=dynamic\n# Do not use cmake 3.9.1, it causes trouble with parallel HDF5.\nmodule load cmake/3.11.4\nmodule unload cray-libsci\nmodule load cray-hdf5-parallel\nmodule load gcc   # Make C++ 14 standard library available to the Intel compiler\nexport BOOST_ROOT=/soft/libraries/boost/1.64.0/intel\ncmake -DENABLE_SOA=1 ..\nmake -j 24\n
              "},{"location":"theta/applications-and-libraries/applications/qmcpack/#running-qmcpack-jobs-on-theta","title":"Running QMCPACK jobs on Theta","text":"

              Below is an example of a submission script of running the binary qmcpack on Theta:

              #!/bin/bash\n#COBALT -q default\n#COBALT -A YOUR_PROJECT\n#COBALT -n 128\n#COBALT -t 30\n#COBALT -O dmc\n#COBALT --attrs mcdram=cache:numa=quad\n\nfile_prefix=NiO-fcc-S8-dmc\nexe=/soft/applications/qmcpack/latest-release/build_KNL_Intel_real_SoA/bin/qmcpack\n\nNCORES=64\nHT=1\nNTHREADS=$((NCORES * HT))\n\naprun -n $COBALT_PARTSIZE -N 1 -cc depth -d $NTHREADS -j $HT $exe $file_prefix.xml > $file_prefix.out\n
              "},{"location":"theta/applications-and-libraries/applications/qmcpack/#references","title":"References","text":"
              • https://qmcpack.org
              • https://qmcpack.readthedocs.io/en/develop/
              "},{"location":"theta/applications-and-libraries/applications/quantum-espresso/","title":"Quantum ESPRESSO on Theta","text":""},{"location":"theta/applications-and-libraries/applications/quantum-espresso/#what-is-quantum-espresso","title":"What Is Quantum ESPRESSO?","text":"

              Quantum ESPRESSO (QE) is a suite of codes for electronic structure calculations and materials research. This code uses Density Functional Theory calculations with periodic boundary conditions to estimate energies, forces, and other properties of atomic scale systems. QE runs in parallel (MPI and OpenMP) and is based on plane waves basis functions and pseudopotentials. QE is an open-source project at http://www.quantum-espresso.org/.

              "},{"location":"theta/applications-and-libraries/applications/quantum-espresso/#how-to-access-quantum-espresso","title":"How to access Quantum ESPRESSO","text":"

              Prebuilt QE binaries are provided on Theta under /soft/applications/quantum_espresso. The binaries for v5.3.0 and v6.2.1 are dynamically linked and require certain version of libraries to function. Please read the README file and load necessary modules. The binaries for v6.3 and beyond are statically linked and there is no dependance on loaded modules.

              "},{"location":"theta/applications-and-libraries/applications/quantum-espresso/#using-quantum-espresso-at-alcf","title":"Using Quantum ESPRESSO at ALCF","text":"

              ALCF provides assistance with build instructions, compiling executables, submitting jobs, and providing prebuilt binaries. For questions, contact us at support@alcf.anl.gov.

              If building your own QE is necessary, follow the instructions below.

              "},{"location":"theta/applications-and-libraries/applications/quantum-espresso/#building-on-theta","title":"Building on Theta","text":"

              This recipe is verified on Oct 12th 2018 with QE-6.3.

              QE uses autotools to create a make.inc file required to compile the code. First, obtain a copy of source code on the Theta login nodes.

              We suggest compiling the code with Intel Fortran compiler (under Cray wrapper) and using Intel MKL for both linear algebra and FFT. Due to the conflict of MKL and libsci, libsci needs to be unloaded. Enabling wavefunction IO via hdf5 library is optional.

              module unload cray-libsci\nmodule load cray-hdf5-parallel \nexport CRAYPE_LINK_TYPE=dynamic\n
              Inside the source code directory, QE can be configured with the follow command in a shell terminal:
              ./configure \\\n --prefix=/home/myhome/mypath \\\n MPIF90=ftn CC=cc --enable-openmp --with-scalapack=intel --with-hdf5=/opt/cray/pe/hdf5-parallel/1.10.1.1/INTEL/16.0 \\\n --with-elpa-include=/soft/applications/elpa/elpa-2017.11.001/include/elpa-2017.11.001/modules \\\n --with-elpa-lib=/soft/applications/elpa/elpa-2017.11.001/lib/libelpa.a\n
              Now proceed to build the code. Use \"make\" and install it with \"make install\".

              If statically linked binaries are needed, do the following before hitting \"make\":

              export CRAYPE_LINK_TYPE=static\n
              Edit make.inc by replacing
              -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64\n
              in SCALAPACK_LIBS with
              ${MKLROOT}/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group\n

              "},{"location":"theta/applications-and-libraries/applications/quantum-espresso/#running-qe-jobs-on-theta","title":"Running QE Jobs on Theta","text":"

              Below is an example of a submission script of running the binary pw.x of QE in Theta:

              #!/bin/bash\n#COBALT -n 8\n#COBALT -t 10\n#COBALT -q default\n#COBALT -A my_project\n#COBALT -O my_test\n\nPROCS_PERNODE=16\nHT=1\nPROCS=$((COBALT_PARTSIZE * PROCS_PERNODE))\nNTHREADS=$((64 * HT / PROCS_PERNODE))\n\npw=/soft/applications/quantum_espresso/6.3/bin/pw.x\n\necho \"Running Cobalt Job $COBALT_JOBID.\"\n\naprun -n $PROCS -N $PROCS_PERNODE -cc depth -d $NTHREADS -j $HT $pw -in ./test.in &> ./out\n

              This script file can be submitted as \u2018qsub script.sh\u2019, assuming you have a \u2018test.in\u2019 file in place.

              "},{"location":"theta/applications-and-libraries/applications/quantum-espresso/#references","title":"References","text":"

              QE User Manual

              "},{"location":"theta/applications-and-libraries/applications/quantum-package/","title":"Quantum Package","text":""},{"location":"theta/applications-and-libraries/applications/quantum-package/#what-is-quantum-package","title":"What Is Quantum Package?","text":"

              Quantum Package (QP) is a Fortran/MPI scalable parallel implementation of selected Configuration Interaction (sCI), approaching the full CI answer when fully converged,. Quantum Package is currently interfaced to Pyscf and QMCPACK and allows the generation of high accuracy trial wavefunctions for small to medium size molecules and solids. A more complete description of the methods implemented in the code can be found on the QP website https://quantumpackage.github.io/qp and in the publication https://doi.org/10.26434/chemrxiv.7749485.v2. While the code is highly scalable, the method underneath sCI scales O(N!) with number of orbitals N. Therefore it\u2019s usage applies to systems smaller than 800 orbitals.

              "},{"location":"theta/applications-and-libraries/applications/quantum-package/#using-quantum-package-at-alcf","title":"Using Quantum Package at ALCF","text":"

              ALCF does not officially support QP software, but provides a binary located and maintained in /soft/applications/quantum_package and guidance to compile your own. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"theta/applications-and-libraries/applications/quantum-package/#how-to-obtain-the-code","title":"How to Obtain the Code","text":"

              Quantum Package is an open source code downloadable at https://github.com/QuantumPackage/qp2. For more information refer to the code manual https://quantumpackage.github.io/qp2.

              "},{"location":"theta/applications-and-libraries/applications/quantum-package/#building-on-theta","title":"Building on Theta","text":"

              While it is recommended to use the version of the code available on /soft/applications/quantum_package , you can build your own version on Theta as follow.

              git clone https://github.com/QuantumPackage/qp2 cd qp2 sed s/\u201dmpiifort\u201d/\u201dftn\u201d/g > config/ifort_avx_mpi.cfg ./configure -c  config/ifort_avx_mpi.cfg source quantum_package.rc ./configure \u2013i all source quantum_package.rc ./configure -c config/ifort_avx_mpi.cfg ninja\n

              Executables are installed and updated on Theta as often as the new updates are released.

              "},{"location":"theta/applications-and-libraries/applications/quantum-package/#running-jobs-on-theta","title":"Running Jobs on Theta","text":"

              The following is an example script \u201crun_fci.sh\u201d to run an fci job on Theta on 128 nodes with 1MPI task per node and 128 threads. The job can be submitted with command \u201cqsub run_fci.sh.\u201d

              cat run_fci.sh #!/bin/bash #COBALT -q default #COBALT -A PRJECT_NAME    #COBALT -n 128 #COBALT -t 180 #COBALT -O MyOutputName #COBALT --attrs mcdram=cache:numa=quad exportATP_ENABLED=1 file_prefix=File  exe=/soft/applications/quantum_package source ${exe}/quantum_package.rc  NCORES=64 HT=2 NTHREADS=$((NCORES * HT)) let SLAVE_NODE=${COBALT_PARTSIZE}-1 aprun -n1-N1-ccdepth -d$NTHREADS-j$HTqp_run fci ${file_prefix}.ezfio>>${file_prefix}-fci.out & sleep 360 aprun -n${SLAVE_NODE}-N1-ccdepth -d$NTHREADS-j$HTqp_run \u2013s fci ${file_prefix}.ezfio>>${file_prefix}-fci-slave.out & wait qsub run_fci.sh\n
              "},{"location":"theta/applications-and-libraries/applications/vasp/","title":"VASP","text":""},{"location":"theta/applications-and-libraries/applications/vasp/#what-is-vasp","title":"What is VASP?","text":"

              The Vienna Ab initio Simulation Package (VASP) is a software package for performing electronic structure calculations with periodic boundary conditions. It is most commonly used that to perform density functional theory (DFT) calculations in a planewave basis using the projector augemented wave (PAW) method. A more complete description of VASP can be found here: https://www.vasp.at

              "},{"location":"theta/applications-and-libraries/applications/vasp/#using-vasp-at-alcf","title":"Using VASP at ALCF","text":"

              VASP is commercial software. Access to binaries compiled by ALCF can only be accessed after the user requesting access has been verified to be on the VASP license by an official VASP license distributor.

              To access the VASP binary at ALCF, please email the details listed directly below to support@alcf.anl.gov. It can take up to 5 - 10 business days to verify a VASP license. These waiting times are longer than usual due to the impact of COVID-19.

              Information to provide: - User\u2019s Full Name: - User\u2019s ALCF username: - Name of Organization that purchased the VASP license: - Principal Investigator who is the POC for the VASP license: - VASP license number: - Version of VASP requested (VASP5, VASP6): - ALCF resource which you plan to run VASP on: [Theta, ThetaGPU, Polaris other]

              "},{"location":"theta/applications-and-libraries/applications/vasp/#vasp-support-policy","title":"VASP Support Policy","text":"

              ALCF compiles the latest release of VASP on a per request basis. We do not offer support for compiling customized versions of VASP with plugins. We are able to provide Makefiles and step-by-step build instructions to users with a verified VASP license. Support for scientific runs that encounter performance or numerical issues should be directed to the official VASP support mailing list or the VASP user forum. Limited support is available for fatal errors encountered at run time.

              "},{"location":"theta/applications-and-libraries/applications/vasp/#how-to-obtain-the-code","title":"How to Obtain the Code","text":"

              The VASP souce can only be obtained by an official license reseller of VASP. This is either the University of Vienna or Material Designs, Inc.

              "},{"location":"theta/applications-and-libraries/applications/vasp/#running-jobs-on-theta","title":"Running Jobs on Theta","text":"

              We have two versions of VASP available: a) 5.4.4. with April 2017 patch b) 6.3.2. Please note that we are no longer providing access to VASP 6-dev (pre-release) per instructions from VASP headquarters.

              The binaries are available here:

              /soft/applications/vasp/vasp.5.4.4.18Apr17/bin/\n/soft/applications/vasp/vasp.6.3.2/bin/\n
              The top-level directory (/soft/applications/vasp) contains example scripts for running VASP as well as step-by-step instructions.

              "},{"location":"theta/applications-and-libraries/applications/vasp/#references","title":"References","text":"

              Please encourage your group to do some tests to determine which binary better suits their needs. Here are some presentations and papers that may be useful in making a decision: - Using VASP at NERSC - Performance of Hybrid MPI/OpenMP VASP on Cray XC40 Based on KNL

              "},{"location":"theta/applications-and-libraries/applications/west/","title":"WEST on Theta","text":""},{"location":"theta/applications-and-libraries/applications/west/#what-is-west","title":"What Is WEST?","text":"

              WEST is a Fortran/MPI scalable parallel implementation of large-scale electronic structure calculations within many-body perturbation theory. WEST is currently interfaced with Quantum Espresso planewave DFT software. As described on the WEST website http://west-code.org, WEST is highly scalable and is used for calculations of solids, liquids, nanostructures, molecules, and interfaces, including samples with ~2000 electrons.

              "},{"location":"theta/applications-and-libraries/applications/west/#using-west-at-alcf","title":"Using WEST at ALCF","text":"

              ALCF provides assistance with build instructions, compiling executables, submitting jobs, and providing prebuilt binaries. Some provided executables on Theta are available here: /soft/applications/qe_west/qe_v6.1.0-west_3.1.0. For questions, contact us at support@alcf.anl.gov.

              "},{"location":"theta/applications-and-libraries/applications/west/#how-to-obtain-the-code","title":"How to Obtain the Code","text":"

              WEST is an open-source code that can be downloaded at http://west-code.org. Similarly, the Quantum Espresso code can be downloaded at https://www.quantum-espresso.org. Check the WEST website for current information on supported algorithms.

              "},{"location":"theta/applications-and-libraries/applications/west/#building-on-theta","title":"Building on Theta","text":"

              WEST currently requires a working PW executable from Quantum Espresso. Current information on installation details for both WEST and PW can be found at http://west-code.org/documentation.php. After Quantum Espresso and WEST codes have been downloaded and unpacked, the PW and WEST executables can be compiled using the following script in the qe- directory.

              cat build_theta.sh\n#!/bin/bash\n\nexport BLAS_LIBS=\"-L$MKLROOT/intel64/lib -Wl,--start-group -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -Wl,--end-group\"\nexport SCALAPACK_LIBS=\"-L$MKLROOT/intel64/lib -Wl,--start-group -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -Wl,--end-group\"\nexport FFT_LIBS=\"\"\nexport MPIF90=\"ftn -g -mkl\"\nexport CC=\"cc -g -mkl\"\nexport F77=\"ftn -g -mkl\"\nexport FFLAGS=\"-xMIC-AVX512 -align array64byte -fp-model fast=2 -no-prec-div -assume byterecl\"\n\n./install/configure --host=x86_64-build-linux-gnu --build=x86_64-target-linux-gnu --enable-parallel --with-scalapack --enable-openmp\n\nmake pw -j 16\n\ncd West\nmake\n

              As newer versions of Quantum Espresso and WEST are released, check the corresponding websites for current information.

              > cat run_west.sh\n#!/bin/bash\n#COBALT -n 2 -t 10 -q debug-cache-quad -A <project_name> -O WEST\n\naprun -n 128 -N 64 -d 1 --cc depth -e OMP_NUM_THREADS=1 -j 1 ./wstat wstat.i\n> qsub run_west.sh\n
              "},{"location":"theta/applications-and-libraries/applications/west/#running-west-jobs-on-theta","title":"Running WEST Jobs on Theta","text":"

              The following is an example executable script \u201crun_west.sh\u201d to run the wstat WEST executable on two nodes of Theta with 64 MPI ranks per node. The job can be submitted with command \u201cqsub run_west.sh\u201d where is replaced with an active project allocation."},{"location":"theta/applications-and-libraries/libraries/boost/","title":"Boost on Theta","text":"

              Boost, a collection of modern, peer-reviewed C++ libraries, is installed on Theta and can be accessed by loading the module corresponding to the appropriate C++ compiler:

              boost/intel/<version> - Boost compiled with Intel's C++ compiler. \nboost/gnu/<version> - Boost compiled with the GNU C++ compiler. \nboost/cray/<version> - Boost compiled with Cray's C++ compiler. \nboost/llvm/<version> - Boost compiled with the Clang C++ compiler.\nboost/llvm-libc++/<version> - Boost compiled with the Clang C++ compiler using libc++ as the C++ standard library (for compatibility with the -stdlib=libc++ option). \n

              The modules will adjust to include linker paths so that the Boost header files and libraries can be found by the compiler and linker. The modules will also set the \u201cBOOST_ROOT\u201d environmental variable. The modules, however, will not cause any Boost library to be automatically linked to your application. If you use a Boost library that requires linking to a pre-compiled library, you are responsible for adding the necessary linking flags (e.g., -lboost_program_options-mt). The name of each library has the following form:

              libboost_<name>-<variant>.{so,a}\n
              where is the variant tag as explained here: http://www.boost.org/doc/libs/1_64_0/more/getting_started/unix-variants.html"},{"location":"theta/applications-and-libraries/libraries/spack/","title":"Spack","text":"

              Spack is a package manager developed for HPC which supports combinatorial versioning, i.e. it allows for multiple versions of packages to be built. These builds can vary with canonical version number, build options, compilers, and processor architectures. Each of a package\u2019s dependencies can be similarly versioned, and a built version is fully specified by a concretized spack spec, and referenced by a hash generated from the specified options.

              The learning curve for spack can be steep and new users are encouraged to experiment with their own installation of spack (easily available for cloning at https://github.com/spack/spack) and to look at configuration settings used in the ALCF installation for hints as to what settings may be appropriate and useful.

              Example settings for Theta are available at /soft/spack/alcf/theta. Not all of these settings will be useful for all builds and some may foil spack\u2019s concretization process, and it is not recommended to adopt these wholesale as global settings. The recommended method is to include these settings ad hoc in a spack environment to more tightly control what information spack\u2019s concretizer uses for its builds.

              There are a growing number of packages installed to the ALCF spack instance at /soft/spack/root, and this instance can be used as a spack upstream resource as described here:

              https://spack.readthedocs.io/en/latest/chain.html#using-multiple-upstream-spack-instances. Pointing to the ALCF spack instance as an upstream repository is trivial. Simply create an \u2018upstreams.yaml\u2019 file in any of the configuration scopes (e.g. $SPACK_ROOT/etc/spack/upstreams.yaml) will allow your instance to leverage any builds in the upstream, and will appear when running \u2018spack find\u2019.

              An example upstreams.yaml:

              upstreams: \n    alcf-spack: \n        install_tree: /soft/spack/root/opt/spack\n
              Support requests and feedback for ALCF-specific issues should be directed to support@alcf.anl.gov. For general spack questions, users are encouraged to consult the following resources:

              • Spack development website
              • Spack documentation
              • Spack tutorial
              • Spack Slack channel
              "},{"location":"theta/applications-and-libraries/visualization/remote-vis/","title":"Remote Visualization on Theta Using VNC","text":"

              For visualization and analysis applications that do not support a client/server mode, VNC can be used for remotely accessing such applications running on Theta.

              "},{"location":"theta/applications-and-libraries/visualization/remote-vis/#setup-on-theta","title":"Setup on Theta","text":"

              On cooley.alcf.anl.gov, if you do not have a ~/.vnc/xstartup file, create one like the following:

              #!/bin/sh\n   xterm &\n   icewm\n
              Be sure to make it executable:

              chmod u+x ~/.vnc/xstartup Also, create a VNC password, which you will need to provide each time you connect a remote VNC client to a VNC server running on Theta:

              vncpasswd

              This will store an obfuscated version in ~/.vnc/passwd

              "},{"location":"theta/applications-and-libraries/visualization/remote-vis/#start-a-vnc-server-on-theta","title":"Start a VNC server on Theta","text":"

              Since we want the VNC server to run on a backend node, in order to avoid increasing the load to login and mom nodes, we need to submit an interactive job:

              qsub -I -n 1 -t <time> -q debug-cache-quad -A <projectID>

              Once your job starts, you will be logged into a mom node, where you can launch a VNC server:

              Note: Make a note of your node number

              vncserver --NeverShared=1 -geometry 1920x1080\nx0vncserver --display=:0.0 --NeverShared=1 --geometry=2400x1500+0+0 --PasswordFile=/home/<username>/.vnc/passwd --MaxProcessorUsage=100\n

              Notes: