Update README.md

segment-anything-models-java · Dec 9, 2024 · a126d48 · a126d48
1 parent fa5fb8b
commit a126d48
Showing 1 changed file with 7 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -25,4 +25,10 @@ The original SAM is a heavy and computationally expensive model. Its original ve
 * [**EfficientViTSAM**](https://github.com/mit-han-lab/efficientvit): SAM like model that uses a special lightweigth image encoder, EfficientViT.
 
 # Big images
-SAM pre-processing includes re-sizing the image to 1024x1024, thus in images bigger than that, every detail smaller than `original _size / 1024` pixel(s) will become subpixel and thus disappear. In order mantain the performance on big images,SAMJ 
+SAM pre-processing includes re-sizing the image to 1024x1024, thus in images bigger than that, every detail smaller than `original _size / 1024` pixel(s) will become subpixel and thus disappear. In order to mantain the performance on big images,SAMJ adds a layer of logic on top of the SAM models.
+
+SAMJ logic depends on the size of the input image. If both sides of the input image are smaller than 512, the image is considered to be "small", thus no extra logic will be used to process the image. The stanndard workflow first encodes the image of interest. This operation is computationally expensive and can take several seconds (up to a minute) depending on the hardware. Due to the cost of this operation, it is just done once, and the resulting encoding is used as many times as the user wants to provide real-time reactivity.
+
+The other operation required to create a mask is the prompt encoding. This requires the user interacts and gives a prompt, in the form of a point, a list of points or a bounding box (rectangle). The prompts are encoded, combined with the image encodings and then decoded into a mask. The process if fast and lightweight, thus it can be done in real-time at the same time that the user is providing more prompts.
+
+