You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Motivation: Various people have asked for various additions to Catwalk already. It's risky because nobody is using Catwalk yet. But we have several people who said they want to (Pradeep, Matt/Hamish, Iz?, Ludwig).
Here are the sub-projects in order of importance:
Promptsource: This is the most requested one. The task would be to add a promptsource instance format to as many tasks as possible, and then evaluate various models with that format.
Make sure we have all the tasks in P3. This might be a no-op after the first item.
Few-shot prompting (for in-context learning). Nobody has explicitly asked for this. I think nobody asks because it's obvious that catwalk would have this.
Crossfit: Pradeep wants to use Crossfit. Those tasks should be fairly easy to add.
T-Few seems like a good baseline for a lot of our work, so it might become a good benchmark set for a while.
BigBench: Pradeep asked about those too, but backed off from it later. Could be a nice addition, but is at the bottom of this list on purpose.
Prompt format that uses the "channel method" for decoder-only models. Nobody has asked for it, but it came up in our reading group. I thought I could verify it with a quick experiment, but I could not.
The text was updated successfully, but these errors were encountered:
Motivation: Various people have asked for various additions to Catwalk already. It's risky because nobody is using Catwalk yet. But we have several people who said they want to (Pradeep, Matt/Hamish, Iz?, Ludwig).
Here are the sub-projects in order of importance:
The text was updated successfully, but these errors were encountered: