diff --git a/index.html b/index.html index 3cc1149..18f9e57 100644 --- a/index.html +++ b/index.html @@ -40,7 +40,7 @@

How's GPT-4 with Vision Doing?

You can contribute your own tests, too! See the GitHub README for contributing instructions.

-

Tests are run every day at 1am PT. Last updated January 04, 2024.

+

Tests are run every day at 1am PT. Last updated January 05, 2024.

Made with ❤️ by the team at Roboflow.

@@ -58,12 +58,12 @@

How's GPT-4 with Vision Doing?

Response Time

-

Today, the average response time to receive results from our tests was 5.28 seconds per request.

+

Today, the average response time to receive results from our tests was 5.23 seconds per request.

This number only accounts for requests made by this application.

-

5.28 s

+

5.23 s

@@ -127,7 +127,61 @@

Result

- + +
+
+
+

Handwriting OCR

+

Can GPT-4V read handwriting?

+
+
+
+

Fail

+
+
+
+
+
+ Last 7-Day Performance +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+ +
+
+

Of the last 7 tests, conducted daily, this test has passed 86.0% of the time.

+

Today's request cost $0.009

+
+
+ +
+

Method

+
We send a image of a handwritten note to determine if it can correctly read the text. If it correctly gets the text, it gets a 100%. Otherwise, it gets a 0%.
+

Prompt

+
+                                            Read the text in the image. Return only the text, with punctuation.
+                                        
+

Image

+ Image of the input into GPT-4 +

Result

+
The words of songs on the album have been echoing in my head all week. "Fades into the grey of my day ok tea."
+

Test submitted by Roboflow

+
+
+
+
@@ -176,7 +230,7 @@

Prompt

Image

Image of the input into GPT-4

Result

-
{'x': 0.28, 'y': 0.4, 'width': 0.18, 'height': 0.3}
+
{'x': 0.325, 'y': 0.3, 'width': 0.25, 'height': 0.45}

Test submitted by Roboflow

@@ -216,7 +270,7 @@

Graph Understanding

Of the last 7 tests, conducted daily, this test has passed 0% of the time.

-

Today's request cost $0.011

+

Today's request cost $0.01

@@ -232,22 +286,10 @@

Image

Result

```json
 {
-  "A": {
-    "quantity": 10,
-    "price": 15
-  },
-  "B": {
-    "quantity": 20,
-    "price": 23
-  },
-  "C": {
-    "quantity": 30,
-    "price": 33
-  },
-  "D": {
-    "quantity": 40,
-    "price": 45
-  }
+  "A": {"quantity": 20, "price": 10},
+  "B": {"quantity": 25, "price": 20},
+  "C": {"quantity": 30, "price": 30},
+  "D": {"quantity": 40, "price": 40}
 }
 ```

Test submitted by Roboflow

@@ -303,13 +345,11 @@

Prompt

Image

Image of the input into GPT-4

Result

-
```json
-{
+                                        
Failed to produce a valid JSON output: {
   "R": 128,
   "G": 0,
   "B": 128
-}
-```
+}

Test submitted by Roboflow

@@ -423,8 +463,8 @@

Image

Result

```json
 {
-  "length": 2.5,
-  "width": 2.5
+  "length": 2.0,
+  "width": 2.0
 }
 ```

Test submitted by Roboflow

@@ -548,61 +588,7 @@

Result

- -
-
-
-

Handwriting OCR

-

Can GPT-4V read handwriting?

-
-
-
-

Pass

-
-
-
-
-
- Last 7-Day Performance -
- -
- -
- -
- -
- -
- -
- -
- -
-
-

Of the last 7 tests, conducted daily, this test has passed 86.0% of the time.

-

Today's request cost $0.009

-
-
- -
-

Method

-
We send a image of a handwritten note to determine if it can correctly read the text. If it correctly gets the text, it gets a 100%. Otherwise, it gets a 0%.
-

Prompt

-
-                                            Read the text in the image. Return only the text, with punctuation.
-                                        
-

Image

- Image of the input into GPT-4 -

Result

-
The words of songs on the album have been echoing in my head all week. "Fades into the grey of my day old tea."
-

Test submitted by Roboflow

-
-
-
- +
diff --git a/results/2024-01-05.json b/results/2024-01-05.json new file mode 100644 index 0000000..4be229a --- /dev/null +++ b/results/2024-01-05.json @@ -0,0 +1,90 @@ +{ + "zero_shot_classification": { + "score": 1, + "success": true, + "price": 0.00481, + "pass_fail": "Pass", + "response_time": 1.6826815605163574, + "result": "Toyota Camry" + }, + "count_fruit": { + "score": 0, + "success": false, + "price": 0.007870000000000002, + "pass_fail": "Fail", + "response_time": 3.4792397022247314, + "result": "9" + }, + "document_ocr": { + "score": 1, + "success": true, + "price": 0.00857, + "pass_fail": "Pass", + "response_time": 7.265234708786011, + "result": "I was thinking earlier today that I have gone through, to use the lingo, eras of listening to each of Swift's Eras. Meta indeed. I started listening to Ms. Swift's music after hearing the Midnights album. A few weeks after hearing the album for the first time, I found myself playing various songs on repeat. I listened to the album in order multiple times." + }, + "handwriting_ocr": { + "score": 0, + "success": false, + "price": 0.008730000000000002, + "pass_fail": "Fail", + "response_time": 5.451120138168335, + "result": "The words of songs on the album have been echoing in my head all week. \"Fades into the grey of my day ok tea.\"" + }, + "extraction_ocr": { + "score": 1.0, + "success": true, + "price": 0.00725, + "pass_fail": "Pass", + "response_time": 2.627626419067383, + "result": "[{'name': 'MARY THOMAS', 'time_per_day': 1, 'medication': 'ATENOLOL', 'dosage': 100, 'rx_number': '1234567-12345'}]" + }, + "math_ocr": { + "score": 1.0, + "success": true, + "price": 0.01528, + "pass_fail": "Pass", + "response_time": 4.44052267074585, + "result": "3x^2-6x+2" + }, + "object_detection": { + "score": 0.24710519832471053, + "success": false, + "price": 0.009490000000000002, + "pass_fail": "Fail", + "response_time": 2.536571979522705, + "result": "{'x': 0.325, 'y': 0.3, 'width': 0.25, 'height': 0.45}" + }, + "graph_understanding": { + "score": 0.965, + "success": false, + "price": 0.01019, + "pass_fail": "Fail", + "response_time": 2.8061134815216064, + "result": "```json\n{\n \"A\": {\"quantity\": 20, \"price\": 10},\n \"B\": {\"quantity\": 25, \"price\": 20},\n \"C\": {\"quantity\": 30, \"price\": 30},\n \"D\": {\"quantity\": 40, \"price\": 40}\n}\n```" + }, + "color_recognition": { + "score": 0, + "success": false, + "price": 0.00875, + "pass_fail": "Fail", + "response_time": 5.634605884552002, + "result": "Failed to produce a valid JSON output: {\n \"R\": 128,\n \"G\": 0,\n \"B\": 128\n}" + }, + "annotation_qa": { + "score": 0.33333333333333337, + "success": false, + "price": 0.015300000000000001, + "pass_fail": "Fail", + "response_time": 2.697636365890503, + "result": "```json\n{\n \"missing\": 1\n}\n```" + }, + "measurement": { + "score": 0.5714285714285714, + "success": false, + "price": 0.00877, + "pass_fail": "Fail", + "response_time": 5.902654409408569, + "result": "```json\n{\n \"length\": 2.0,\n \"width\": 2.0\n}\n```" + } +} \ No newline at end of file