Methodology

Memory-Exam-1

The memory-exam-1 benchmark evaluates how well a memory system can recall and retrieve information from Sakura Tanaka's lifelog, a synthetic individual we created for testing. This comprehensive assessment includes 100 questions of varying difficulty, all focused on long-term memory recollection.

Our benchmark goes beyond simple Q&A—it tests both standard question-and-answer recall and the system's ability to locate specific image and video assets within her digital memories.

Dataset Overview: Sakura's lifelog contains 400 carefully curated assets across four modalities:

  • 100 plain text conversations between Sakura and an AI chatbot
  • 100 image + text conversations between Sakura and an AI chatbot
  • 100 video + text conversations between Sakura and an AI chatbot
  • 100 one-minute audio recordings of Sakura talking about her life

Performance Metrics: We evaluate systems using two key measurements:

  • Accuracy — correct answers out of 100 total questions
  • Latency — average time to retrieve answers and locate required assets