Methodology
Memory-Exam-1
The memory-exam-1 benchmark evaluates how well a memory system can recall and retrieve information from Sakura Tanaka's lifelog, a synthetic individual we created for testing. This comprehensive assessment includes 100 questions of varying difficulty, all focused on long-term memory recollection.
Our benchmark goes beyond simple Q&A—it tests both standard question-and-answer recall and the system's ability to locate specific image and video assets within her digital memories.
Dataset Overview: Sakura's lifelog contains 400 carefully curated assets across four modalities:
- 100 plain text conversations between Sakura and an AI chatbot
- 100 image + text conversations between Sakura and an AI chatbot
- 100 video + text conversations between Sakura and an AI chatbot
- 100 one-minute audio recordings of Sakura talking about her life
Performance Metrics: We evaluate systems using two key measurements:
- Accuracy — correct answers out of 100 total questions
- Latency — average time to retrieve answers and locate required assets