Note that the model we are testing for drift here is gemini-1.5-flash-latest, which Google says is 'updated regularly and might be a preview version'. Since it is a model that changes frequently and is not meant to be used in production applications, we frequently detect drift in it.
Drift status | Prompt | Test cases | Anomaly % | Deviation trend |
---|---|---|---|---|
No Drift | Math Word Problems | 54 | 1.94 % (182) | |
Drift Likely | Product Review | 25 | 19.72 % (868) | |
No Drift | Sales Email Generator | 50 | 1.66 % (146) | |
No Drift | Trivia Master | 70 | 0.82 % (100) | |
Drift Likely | Information Extraction | 25 | 8.36 % (368) | |
Drift Likely | Translator | 60 | 15.02 % (1586) | |
No Drift | Summarization | 50 | 3.41 % (300) | |
Drift Likely | Customer Intent | 58 | 14.02 % (1431) | |
No Drift | Script Doctor | 50 | 1.64 % (145) | |
No Drift | Product Name Generator | 53 | 1.53 % (142) |