Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Expose attn, batch, ubatch, cach_type_kv settings to the UI and bench results #148

Merged

Conversation

a-ghorbani
Copy link
Owner

@a-ghorbani a-ghorbani commented Dec 26, 2024

Description

This PR adds the following model initialization parameters to the settings page UI:

  • n_batch
  • n_ubatch
  • n_threads
  • flash_attn
  • cache_type_k
  • cache_type_v

Additionally, the benchmark results now include these metrics:

  • n_context
  • n_batch
  • n_ubatch
  • n_threads
  • flash_attn
  • cache_type_k
  • cache_type_v
  • n_gpu_layers

Fixes #79

Platform Affected

  • iOS
  • Android

Checklist

  • Necessary comments have been made.
  • I have tested this change on:
    • iOS Simulator/Device
    • Android Emulator/Device
  • Unit tests and integration tests pass locally.

@a-ghorbani a-ghorbani marked this pull request as ready for review December 26, 2024 19:46
@a-ghorbani a-ghorbani merged commit b836d07 into main Dec 26, 2024
2 of 3 checks passed
@a-ghorbani a-ghorbani deleted the feat/add-attn-batch-ubatch-etc-to-model-init-and-bench branch December 26, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feat]: quantized KV cache and flash attention
1 participant