What's Changed
-
PyRIT now has a website
-
We've been working on standardizing orchestrators in terms of naming and functionality:
- The endpoint (of type
PromptTarget
) that PyRIT attacks will be referred to asobjective_target
. - The endpoint (of type
PromptChatTarget
) that helps us craft attacks will be referred to asadversarial_chat
. - Beyond that, we've settled on a common interface for multi-turn orchestrators with a shared result object.
- Instead of an
attack_strategy
arg we require a file path calledadversarial_chat_system_prompt_path
to make the connection to theadversarial_chat
target clearer. Some orchestrators have a default for this, of course. - The initial prompt to the
adversarial_chat
is now calledadversarial_chat_seed_prompt
to also help with clarity and connection toadversarial_chat
- Sometimes we use multiple scorers. For that reason,
objective_scorer
will be the scorer that decides if the objective has been achieved. Other scorers have similarly specific names, e.g.,on_topic_scorer
in theCrescendoOrchestrator
- The new standard name for all orchestrators to execute an attack is
run_attack_async
.
The standardization is not fully completed yet but will continue in future releases. So far,
CrescendoOrchestrator
,TreeOfAttacksWithPruningOrchestrator
, andRedTeamingOrchestrator
have been adjusted. - The endpoint (of type
-
Support for a centralized database using Azure SQL as an optional alternative to a local DuckDB database.
-
Introduced (multi-modal)
SeedPrompt
s andSeedPromptDataset
s as a starting point for red teaming ops with integration to our databases. -
New orchestrators and auxiliary attacks:
FuzzerOrchestrator
with 5 template converters- GCG support via Azure ML pipelines to optimize adversarial suffixes
- FlipAttackOrchestrator
-
New targets:
- HuggingFaceChatTarget
- HTTPTarget
- Open AI and Azure Open AI targets were refactored to simplify the logic. They now share a common interface
OpenAITarget
and you can decide between Azure vs. Open AI usingis_azure_target=True
orFalse
.
-
New datasets:
- HarmBench
- PKU-SafeRLHF
- wmdp-bio, wmdp-chem, and wmdp-cyber (now fetchable from the original data source)
- AdvBench
- Decoding Trust Stereotypes
- LLM-LAT/harmful-dataset
- tdc23 red teaming dataset
- TrustAIRLab/forbidden_question_set
- LibrAI 'Do Not Answer' Dataset
-
New converters:
- QRCodeConverter
- AzureSpeechAudioToTextConverter
- URLConverter
- HumanInTheLoopConverter
- ColloquialWordswapConverter
- UnicodeConfusableConverter (updated with new functionality)
- CharSwapGenerator
- MaliciousQuestionGeneratorConverter
- AsciiSmugglerConverter
- MathPromptConverter
- AudioFrequencyConverter
- ZeroWidthConverter
- DiacriticConverter
-
New scorers:
- SelfAskRefusalScorer
- HumanInTheLoopScorer
- InsecureCodeScorer
-
We generally use a
.env
file to configure details of endpoints that PyRIT needs to execute. A new.env.local
override file allow for further customization. -
Finally, PyRIT now comes with several extras that you can install using
pip install pyrit[<extra>]
dev
includes developer dependencies that you shouldn't need unless you plan on contributing to the project.torch
includes just pytorch which is needed for some targets (e.g. Hugging Face) or auxiliary attacks (e.g., GCG) but not core functionality. This allows you to choose whether you want to install it.gcg
includes extra dependencies that are only needed for running GCG. Since this requires dedicated compute (ideally with GPU) you can choose whether it is required for you.all
includes all of the above.
Full list of changes
- MAINT Update release version to 0.4.1.dev0 by @rdheekonda in #342
- [FEAT] QRCodeConverter by @jsong468 in #339
- [MAINT] Delete output_filename arg in image/text and text/image converters by @jsong468 in #344
- MAINT Update Release Instructions by @rdheekonda in #345
- FEAT: Add Likert scoring definition and prompt templates for persuasion and deception by @saphirqi7 in #307
- [FEAT] Add "task" to the scoring memory entry by @jsong468 in #349
- FEAT: Add fetch function for datasets from HarmBench #270 by @KutalVolkan in #341
- FEAT Add SQL Entra Auth for Azure SQL Server by @elgertam in #330
- [MAINT] Fix typos in OllamaChatTarget by @riedgar-ms in #357
- [FEAT] Azure Speech Audio to Text Converter by @jsong468 in #352
- FEAT: Add Rate Limit (RPM) Threshold Parameter to Prompt Targets by @nina-msft in #331
- FIX: correct type of the top_p argument in various PromptTarget classes by @s-zanella in #366
- FEAT Add ability to fetch PKU-SafeRLHF Data by @enrajka in #374
- FEAT: Refusal Scorer by @rlundeen2 in #371
- FEAT Add ability to fetch wmdp-bio, wmdp-chem, and wmdp-cyber datasets by @mshirsekar1 in #380
- TEST skip failing auth test after the new azure.identity version was released by @romanlutz in #387
- FEAT Added AdvBench dataset by @enrajka in #383
- FEAT: Fuzzer orchestrator by @gseetha04 in #360
- FIX Crescendo Bug and Improve Scorer Metaprompt Handling by @rdheekonda in #389
- FEAT: Add Centralized DB Support Using Azure by @rdheekonda in #379
- FIX: Updating memory and fixing bugs by @rlundeen2 in #394
- FEAT: Handling duplicate memory for PromptRequestPiece/Score entries by @jsong468 in #369
- [FEAT] Decoding Trust Stereotypes Dataset by @jsong468 in #385
- FEAT Centralized DB Support for Azure Speech Converters by @rdheekonda in #402
- FEAT add additional template converters for fuzzer orchestrator (crossover, similar, rephrase) by @roeybc in #378
- DOC: Update Custom Targets Demo Docs by @nina-msft in #404
- FEAT New URL Converter by @jbolor21 in #399
- [FEAT] HumanInTheLoop Converter by @jsong468 in #401
- DOC: Updating RTO example to use gpt4o for scoring by @rlundeen2 in #408
- MAINT: Crescendo and Score Refactor by @rlundeen2 in #405
- FEAT: Colloquial Wordswap Attack by @eugeniavkim in #406
- FEAT emoji jailbreak by @romanlutz in #314
- MAINT: Add Refusal docs and Filter logic by @rlundeen2 in #431
- DOC: Moving rate limiting to target by @rlundeen2 in #433
- FEAT: optimized huggingface model support by @KutalVolkan in #354
- DOC Enhance Azure SQL Database Setup and Permissions Documentation by @rdheekonda in #434
- FIX Azure SQL DB Permissions by @rdheekonda in #440
- FIX: Handle JSON markdown format exceptions by @meisman-ms in #435
- FEAT: Add ability to send prepend to the conversation in PromptSendingOrchestrator by @rlundeen2 in #441
- FEAT: Homoglyph Attack by @KutalVolkan in #407
- FEAT: Charswap Attack by @KutalVolkan in #403
- Add Python option for generate docs scripts by @sf-msft in #375
- FEAT: Violent Durian Attack Strategy by @KutalVolkan in #398
- FEAT GCG algorithm and AML pipeline by @blakebullwinkel in #381
- MAINT: Adding original values as score metadata for Azure Safety and Likert Scorers by @rlundeen2 in #445
- [DOC] Note on notebooks by @riedgar-ms in #460
- FIX: Fixing pre-commit check_links by @rlundeen2 in #462
- FEAT: Adding Flip Attack by @rlundeen2 in #456
- [FIX] Allow AAD Auth for AzureContentFilterScorer by @riedgar-ms in #455
- FEAT: Adding New Generic HTTP Target by @jbolor21 in #446
- MAINT: Rounds in CrescendoOrchestrator are now "Turns" by @jsong468 in #470
- DOC Add doc changes for database setup by @eugeniavkim in #476
- FEAT: OpenAI Target Refactor by @rlundeen2 in #466
- DOC: Edit Image Text Converter Docs by @jbolor21 in #477
- FEAT: Malicious Question Generator by @KutalVolkan in #397
- FIX: Changed AzureSpeechTextToAudioConverter input_type to text and added converter input_supported tests by @jsong468 in #472
- FEAT added ascii smuggler converter by @gio-msft in #479
- DOC Fix Invalid MD File Referenced in Deploy HF Model to Azure ML Module by @rdheekonda in #485
- FIX: Re-Ran Jupytext on Crescendo Notebook by @jsong468 in #484
- FIX Warnings in pipelines (Issue #442) by @Tiger-Du in #481
- FEAT Add LLM-LAT/harmful-dataset #420 by @SnehaDharne in #437
- FIX: Small Notebook Fixes and env_example updates by @jsong468 in #487
- FEAT add tdc23 red teaming dataset by @Lakshmiaddepalli in #438
- MAINT Adding TrueFalseQuestion to initialize scorer more easily by @rlundeen2 in #488
- MAINT: Stripping json in llm scorers by @rlundeen2 in #489
- DOC: Adds citation section to README.md by @dlmgary in #491
- FIX Updating env variable for DALL E by @eugeniavkim in #492
- FIX: Remove Duplicate Import Statement in Documentation Examples by @douyipu in #495
- FIX changed OpenAIChatTarget default values by @blakebullwinkel in #496
- [DRAFT] FEAT: MathPromptConverter to Transform Prompts into Mathematical Problems by @KutalVolkan in #490
- FIX Set Unique Conversation IDs (RedTeamingOrchestrator) by @nina-msft in #468
- MAINT: Consolidate UnicodeConfusableConverter and HomoglyphGeneratorConverter by @jsong468 in #497
- Fix PromptMemoryEntry columns data types to support non-English values by @rdheekonda in #499
- FIX Added "Invalid prompt" OAI error to bad request exception handler by @blakebullwinkel in #500
- MAINT: Consistency Improvements by @rlundeen2 in #498
- [DRAFT] DOC: Add Skeleton Key Attack Demo by @KutalVolkan in #502
- FIX Include max_completion_tokens argument for OpenAIChatTarget by @nina-msft in #501
- FEAT: Add audio frequency converter by @michellemorales in #478
- FIX: Separating OpenAIChatTarget Arguments by @rlundeen2 in #505
- MAINT: Refactor azure ml target by @jsong468 in #463
- MAINT: Adding MultiTurn Abstract Orchestrator Interface by @rlundeen2 in #504
- FEAT Add TrustAIRLab/forbidden_question_set Dataset #453 by @ritikakumar0204 in #503
- FEAT: database connector to store and retrieve prompts, prompt templates, and prompt groups by @romanlutz in #396
- FIX fix references to renamed powershell files by @mhaoda in #510
- FEAT Add export for conversations and scores by @eugeniavkim in #517
- FIX: Removed unnecessary add_response_entries_to_memory mocking and changed normalized target 'endpoint' param by @jsong468 in #521
- MAINT: Removing SeedPromptTemplate by @rlundeen2 in #520
- MAINT: Remove many shot Template by @rlundeen2 in #522
- FEAT: Add Zero-Width-Converter by @KutalVolkan in #519
- FEAT: Add Diacritics Converter by @KutalVolkan in #518
- MAINT: Standardizing Multi-Turn Orchestrators by @rlundeen2 in #509
- MAINT: Removing attack strategy by @rlundeen2 in #525
- FEAT add seed prompt dataset loading function for legacy datasets by @romanlutz in #524
- DOC Add jupyterbook project site page by @sf-msft in #430
- FIX outdated link by @romanlutz in #533
- FEAT: Functionality to update PromptMemoryEntries by @jsong468 in #531
- FEAT HITL Scorers by @jbolor21 in #493
- MAINT: Add Centralized Memory Management by @rdheekonda in #527
- MAINT Update DuckDB Memory Demo Notebook Documentation by @rdheekonda in #536
- FIX use cluster for compute by @romanlutz in #538
- FIX Remove
aria2c
dependency from HuggingFace Target by @nina-msft in #530 - [FIX] Fix broken azure_auth test by @jsong468 in #544
- FIX import tkinter only when using it to avoid import errors on ubuntu/macos by @romanlutz in #542
- DOC publish to GH pages when pushing changes to main by @romanlutz in #545
- FIX Fuzzer Converter Templates by @rdheekonda in #546
- FEAT: Add Insecure Code Scorer by @KutalVolkan in #523
- MAINT: Updating refusal scorer to work without tasks by @rlundeen2 in #547
- DOC bring back numbering for user guide, raise build issues as errors, and fix warnings by @romanlutz in #549
- FIX remove unnecessary threshold arg by @romanlutz in #550
- MAINT: Allowing prepending conversations in PSO from memory by @rlundeen2 in #555
- FEAT Enhance .env loading with optional .env.local overrides by @rdheekonda in #559
- MAINT update dependencies to separate torch into an extra, prune unnecessary ones, and related small fixes by @romanlutz in #556
- FIX remove timezone info, pass timestamp around when retrieving data from DB by @romanlutz in #560
- Fix TAP Orchestrator Invalid Argument by @rdheekonda in #561
- DOC: Relocate use_huggingface_chat_target notebook and script to targets directory by @KutalVolkan in #558
- FIX: Fixing bug in doc and adding repr to models by @rlundeen2 in #564
- MAINT: TAP Multi-turn refactor by @rlundeen2 in #562
- FEAT: Add LibrAI 'Do Not Answer' Dataset by @KutalVolkan in #565
- DOC: Add batch scoring example for SelfAskTrueFalseScorer by @KutalVolkan in #563
- FIX: Fixing and improving crescendo adversarial_chat prompt by @rlundeen2 in #570
- FIX repair component governance by @romanlutz in #557
- FEAT: Pass arguments to http client by @AlexRRR in #554
- [FEAT] Global Memory Labels by @jsong468 in #571
- FIX release related fixes by @romanlutz in #575
New Contributors
- @saphirqi7 made their first contribution in #307
- @riedgar-ms made their first contribution in #357
- @s-zanella made their first contribution in #366
- @enrajka made their first contribution in #374
- @mshirsekar1 made their first contribution in #380
- @gseetha04 made their first contribution in #360
- @roeybc made their first contribution in #378
- @eugeniavkim made their first contribution in #406
- @meisman-ms made their first contribution in #435
- @sf-msft made their first contribution in #375
- @gio-msft made their first contribution in #479
- @Tiger-Du made their first contribution in #481
- @SnehaDharne made their first contribution in #437
- @Lakshmiaddepalli made their first contribution in #438
- @douyipu made their first contribution in #495
- @michellemorales made their first contribution in #478
- @ritikakumar0204 made their first contribution in #503
- @mhaoda made their first contribution in #510
- @AlexRRR made their first contribution in #554
Full Changelog: v0.4.0...v0.5.0