Note that the model we are testing for drift here is gemini-1.5-flash-latest, which Google says is 'updated regularly and might be a preview version'. Since it is a model that changes frequently and is not meant to be used in production applications, we frequently detect drift in it.
Drift status | Prompt | Test cases | Anomaly % | Deviation trend |
---|---|---|---|---|
No Drift | Math Word Problems | 54 | 1.94 % (190) | |
Drift Likely | Product Review | 25 | 18.99 % (869) | |
No Drift | Sales Email Generator | 50 | 1.62 % (148) | |
No Drift | Trivia Master | 70 | 0.83 % (105) | |
Drift Likely | Information Extraction | 25 | 8.22 % (376) | |
Drift Likely | Translator | 60 | 15.14 % (1662) | |
No Drift | Summarization | 50 | 3.49 % (319) | |
Drift Likely | Customer Intent | 58 | 13.72 % (1456) | |
No Drift | Script Doctor | 50 | 1.62 % (149) | |
No Drift | Product Name Generator | 53 | 1.49 % (144) |