Note that the model we are testing for drift here is gemini-1.5-flash-latest, which Google says is 'updated regularly and might be a preview version'. Since it is a model that changes frequently and is not meant to be used in production applications, we frequently detect drift in it.
Drift status | Prompt | Test cases | Anomaly % | Deviation trend |
---|---|---|---|---|
No Drift | Math Word Problems | 54 | 1.53 % (75) | |
Drift Likely | Product Review | 25 | 36.14 % (841) | |
No Drift | Sales Email Generator | 50 | 1.31 % (61) | |
No Drift | Trivia Master | 70 | 0.67 % (43) | |
Drift Likely | Information Extraction | 25 | 11.70 % (272) | |
Drift Likely | Translator | 60 | 14.78 % (825) | |
No Drift | Summarization | 50 | 2.24 % (104) | |
Drift Likely | Customer Intent | 58 | 20.33 % (1097) | |
No Drift | Script Doctor | 50 | 1.13 % (53) | |
No Drift | Product Name Generator | 53 | 1.46 % (71) |