Android 磁盘最近最少缓存实现 - DiskLruCache 源码分析

Posted 2020-03-9 Updated 2024-05- 15

By Chenhe

40~52 min read

DiskLruCache 是一个 Android 端使用广泛的磁盘 LRU （最近最少使用）缓存算法的实现库，甚至在 AOSP 中都有使用。

基础使用

为了对整体架构有个印象，方便找分析入口，先看看基本的使用。

实例化：

val cache = DiskLruCache.open(cacheFile, version, valueCount, maxSize)

写入：

val editor = cache.edit(key.md5())
val os = editor.newOutputStream(0)

读取：

val snapshot = cache.get(key.md5())
val is = snapshot?.getInputStream(0)

关于它的具体使用可以看这篇文章。

整体来看并不复杂，实际上代码量也确实不多。通过读取时的 snapshot 封装猜测后续修改不会影响已经读取到的值。

日志文件

DiskLruCache 的一个显著特征就是在缓存目录有一个 journal 文件，用于记录所有操作日志，大概长这个样子：

libcore.io.DiskLruCache
1
19
1

DIRTY c915a1313f3e413252fae2957469dcd2
CLEAN c915a1313f3e413252fae2957469dcd2 50196
READ c915a1313f3e413252fae2957469dcd2
DIRTY 8b33d00d5f9d443095e20829a1367e05
CLEAN 8b33d00d5f9d443095e20829a1367e05 159594

前五行是元数据：

一个魔法数，固定字符串，用于标识这是一个 DiskLruCache 的日志文件。
DiskLruCache 版本号，从发布以来就没变过，预留一下便于后续格式更新。
开发者声明的应用版本号。
每个 key 对应几个 value，一般都是1。一个 key 支持写入多个值，每个值都是一个文件，文件名是 key.index。
一个空行

每次初始化时都要校验这些元数据。

之后是真正的日志数据，每一行是一个记录，拥有 state, key, option 字段，用空格分隔，其中 option 字段根据 state 的不同是可选的。

DIRTY: 表示一项数据正在被写入（创建或更新）。后面必须跟着一个 CLEAN 或 REMOVE。否则这个项目将被视为临时文件，需要清除。
CLEAN: 表示一项数据被成功写入。后面跟着一个数据大小 (Byte)，如果一个 key 对应多个数据，则跟着多个大小。
REMOVE: 表示一项数据已经被删除。
READ: 表示对应项被读取了一次。其实这个不能算是状态，就是一个操作记录。

实例化

`open()`

那就直接从使用的入口开始分析吧。

  /**
   * 在指定位置打开一个 cache 实例，如果日志文件不存在就创建一个。
   *
   * @param directory 一个可写的目录
   * @param valueCount 每个缓存项的数据个数
   * @param maxSize 可占用的最大空间
   * @throws IOException if reading or writing the cache directory fails
   */
  public static DiskLruCache open(File directory, int appVersion, int valueCount, long maxSize)
      throws IOException {
    // 检查参数
    if (maxSize <= 0) {
      throw new IllegalArgumentException("maxSize <= 0");
    }
    if (valueCount <= 0) {
      throw new IllegalArgumentException("valueCount <= 0");
    }

    // 检查备份日志文件
    File backupFile = new File(directory, JOURNAL_FILE_BACKUP);
    if (backupFile.exists()) {
      File journalFile = new File(directory, JOURNAL_FILE);
      // 如果日志存在就删除备份
      if (journalFile.exists()) {
        backupFile.delete();
      } else {
        renameTo(backupFile, journalFile, false);
      }
    }

    // 日志文件存在则继续处理
    DiskLruCache cache = new DiskLruCache(directory, appVersion, valueCount, maxSize);
    if (cache.journalFile.exists()) {
      try {
        cache.readJournal();
        cache.processJournal();
        cache.journalWriter = new BufferedWriter(
            new OutputStreamWriter(new FileOutputStream(cache.journalFile, true), Util.US_ASCII));
        return cache;
      } catch (IOException journalIsCorrupt) {
        System.out
            .println("DiskLruCache "
                + directory
                + " is corrupt: "
                + journalIsCorrupt.getMessage()
                + ", removing");
        cache.delete();
      }
    }

    // 否则执行初始化
    directory.mkdirs();
    cache = new DiskLruCache(directory, appVersion, valueCount, maxSize);
    cache.rebuildJournal();
    return cache;
  }

open 函数没什么好讲的，重点都注释了。一波预处理后判断日志文件是否存在，不存在就执行初始化创建一个，存在就读取并处理。

初始化

初始化主要是调用 rebuildJournal().

  /**
   * 创建一个新日志，并覆盖已有的。
   */
  private synchronized void rebuildJournal() throws IOException {
    if (journalWriter != null) {
      journalWriter.close();
    }

    Writer writer = new BufferedWriter(
        new OutputStreamWriter(new FileOutputStream(journalFileTmp), Util.US_ASCII));
    try {
      // 写入元数据
      writer.write(MAGIC);
      writer.write("\n");
      writer.write(VERSION_1);
      writer.write("\n");
      writer.write(Integer.toString(appVersion));
      writer.write("\n");
      writer.write(Integer.toString(valueCount));
      writer.write("\n");
      writer.write("\n");

      // 将已有的数据项写入日志
      for (Entry entry : lruEntries.values()) {
        if (entry.currentEditor != null) {
          writer.write(DIRTY + ' ' + entry.key + '\n');
        } else {
          writer.write(CLEAN + ' ' + entry.key + entry.getLengths() + '\n');
        }
      }
    } finally {
      writer.close();
    }

    // 覆盖已有的
    if (journalFile.exists()) {
      renameTo(journalFile, journalFileBackup, true);
    }
    renameTo(journalFileTmp, journalFile, false);
    journalFileBackup.delete();

    journalWriter = new BufferedWriter(
        new OutputStreamWriter(new FileOutputStream(journalFile, true), Util.US_ASCII));
  }

初始化也不复杂，主体就是写入头信息和已有的缓存数据。未在编辑的直接标注 CLEAN 也就是准备就绪可以读取，正在编辑的则标注 DIRTY 等待后续操作。

稍微注意下替换过程，写入的是一个临时文件，将现有文件备份，最后才是用临时文件覆盖。这样如果出现异常还可以恢复。

读取处理日志

如日志文件存在首先进行读取，调用 readJournal() 函数。

private void readJournal() throws IOException {
    StrictLineReader reader = new StrictLineReader(new FileInputStream(journalFile), Util.US_ASCII);
    try {
      // 校验文件头
      String magic = reader.readLine();
      String version = reader.readLine();
      String appVersionString = reader.readLine();
      String valueCountString = reader.readLine();
      String blank = reader.readLine();
      if (!MAGIC.equals(magic)
          || !VERSION_1.equals(version)
          || !Integer.toString(appVersion).equals(appVersionString)
          || !Integer.toString(valueCount).equals(valueCountString)
          || !"".equals(blank)) {
        throw new IOException("unexpected journal header: [" + magic + ", " + version + ", "
            + valueCountString + ", " + blank + "]");
      }

      // 读取并顺便计算下日志记录数
      int lineCount = 0;
      while (true) {
        try {
          readJournalLine(reader.readLine());
          lineCount++;
        } catch (EOFException endOfJournal) {
          break;
        }
      }
      redundantOpCount = lineCount - lruEntries.size();
    } finally {
      Util.closeQuietly(reader);
    }
  }

读取的时候计算了一下日志的行数，减去真正有效的数据项，剩下的就是多余日志行数。后面会用来维护日志文件以免越来越臃肿。

具体日志解析在 readJournalLine() 函数里。

  private void readJournalLine(String line) throws IOException {
    int firstSpace = line.indexOf(' ');
    if (firstSpace == -1) {
      throw new IOException("unexpected journal line: " + line);
    }

    int keyBegin = firstSpace + 1;
    int secondSpace = line.indexOf(' ', keyBegin);
    final String key;
    if (secondSpace == -1) {
      // 找不到第二个空格，即：只有 state 和 key 两个字段
      key = line.substring(keyBegin);
      if (firstSpace == REMOVE.length() && line.startsWith(REMOVE)) {
        // REMOVE：删除对应的数据
        lruEntries.remove(key);
        return;
      }
    } else {
      key = line.substring(keyBegin, secondSpace);
    }

    Entry entry = lruEntries.get(key);
    if (entry == null) {
      // 如果没有加入到 lruEntries 就加进去
      entry = new Entry(key);
      lruEntries.put(key, entry);
    }

    if (secondSpace != -1 && firstSpace == CLEAN.length() && line.startsWith(CLEAN)) {
      // 设置 CLEAN （就绪）状态
      String[] parts = line.substring(secondSpace + 1).split(" ");
      entry.readable = true;
      entry.currentEditor = null;
      entry.setLengths(parts);
    } else if (secondSpace == -1 && firstSpace == DIRTY.length() && line.startsWith(DIRTY)) {
      // 设置编辑状态
      entry.currentEditor = new Editor(entry);
    } else if (secondSpace == -1 && firstSpace == READ.length() && line.startsWith(READ)) {
      // 忽略读取记录
      // This work was already done by calling lruEntries.get().
    } else {
      throw new IOException("unexpected journal line: " + line);
    }
  }

还算是比较清晰。首先判断的是 REMOVE 记录，操作也很粗暴，直接从 lruEntries 里删除掉。接下来既然没有 REMOVE 那么就是存在的数据，应当加入集合，所以判断没有加入的话就加进去。最后根据读取到的值设置一下数据项的状态就好了。

不难看出解析之后最后集合中存在的元素需要满足 ①拥有至少一个非 REMOVE 状态的记录。 ②之后没有 REMOVE 记录。注意此时 DIRTY 也在集合中。

一个比较有意思的是对 state 字段的判断，利用 startsWith 和 length 完成，相对于先 substring 再比较效率稍微高一点。

到这里读取就结束了，下面进行处理。

读取完后 open() 会继续调用 processJournal() 函数。

  /**
   * 计算缓存大小并删除 Dirty 的数据项。
   */
  private void processJournal() throws IOException {
    deleteIfExists(journalFileTmp);
    for (Iterator<Entry> i = lruEntries.values().iterator(); i.hasNext(); ) {
      Entry entry = i.next();
      if (entry.currentEditor == null) {
        // 计算有效数据总大小
        for (int t = 0; t < valueCount; t++) {
          size += entry.lengths[t];
        }
      } else {
        // 清除无效数据
        entry.currentEditor = null;
        for (int t = 0; t < valueCount; t++) {
          deleteIfExists(entry.getCleanFile(t));
          deleteIfExists(entry.getDirtyFile(t));
        }
        i.remove();
      }
    }
  }

比解析日志简单一些，主要做了两件事：计算缓存大小、删除无效数据。

无效数据指的是单独出现了 DIRTY 但后面没有 REMOVE 或 CLEAN 与之对应的。这类项目在读取的时候我们把它加入了集合并设置为编辑模式，所以现在直接判断 currentEditor 就行了。

此时集合中只剩下有效的 CLEAN 项目，实例化完成。

写入缓存

我们通过拿到一个 DiskLruCache.Editor 打开一个输出流来进行写入操作，所以从 cache.edit() 开始分析。

  private synchronized Editor edit(String key, long expectedSequenceNumber) throws IOException {
    // 校验参数、状态
    checkNotClosed();
    validateKey(key);
    Entry entry = lruEntries.get(key);
    if (expectedSequenceNumber != ANY_SEQUENCE_NUMBER && (entry == null
        || entry.sequenceNumber != expectedSequenceNumber)) {
      return null; // Snapshot is stale.
    }

    // 没有就创建，正在编辑就返回 null
    if (entry == null) {
      entry = new Entry(key);
      lruEntries.put(key, entry);
    } else if (entry.currentEditor != null) {
      return null; // Another edit is in progress.
    }

    // 设置编辑状态
    Editor editor = new Editor(entry);
    entry.currentEditor = editor;

    // 写入日志并 flush 避免文件泄露（也就是写入了缓存但是日志中没有）
    journalWriter.write(DIRTY + ' ' + key + '\n');
    journalWriter.flush();
    return editor;
  }

也是非常常规易懂的操作。最后先调用一下 flush 写入磁盘，从实例化过程我们就能看出 DiskLruCache 完全靠日志来读取缓存数据，如果有数据写到了磁盘却没有在日志中体现，就永远不会被读取或被删除，造成文件泄露。

获取 editor 后可以通过 newOutputStream 拿到一个输出流用来写入。

    /**
     * 在给定 index 除打开一个输出流。如果写入磁盘时遇到了错误那么调用 commit 
     * 时编辑操作将终止并标记为失败。返回的输出流不会抛出 IOExceptions.
     */
    public OutputStream newOutputStream(int index) throws IOException {
      synchronized (DiskLruCache.this) {
        if (entry.currentEditor != this) {
          throw new IllegalStateException();
        }
        if (!entry.readable) {
          written[index] = true;
        }
        // 创建一个临时文件
        File dirtyFile = entry.getDirtyFile(index);
        FileOutputStream outputStream;
        try {
          outputStream = new FileOutputStream(dirtyFile);
        } catch (FileNotFoundException e) {
          // Attempt to recreate the cache directory.
          directory.mkdirs();
          try {
            outputStream = new FileOutputStream(dirtyFile);
          } catch (FileNotFoundException e2) {
            // We are unable to recover. Silently eat the writes.
            return NULL_OUTPUT_STREAM;
          }
        }
        return new FaultHidingOutputStream(outputStream);
      }
    }

getDirtyFile 返回的是一个名称为 {key}.{index}.tmp 的临时文件。把这个文件的输出流通过 FaultHidingOutputStream 封装最后返回。封装很简单就是对常见操作全部捕获了异常，内部设置一个 hasErrors 来记录。

写入完成后我们需要调用 comit 方法。

    /**
     * 提交此次编辑，数据项转为就绪状态。并解除锁定允许再次编辑。
     */
    public void commit() throws IOException {
      if (hasErrors) {
        completeEdit(this, false);
        remove(entry.key); // The previous entry is stale.
      } else {
        completeEdit(this, true);
      }
      committed = true;
    }

这里进行了一下错误判断。先 completeEdit 走完流程，如果有错那么直接删除掉。接下来看看是怎么完成编辑流程的。

  private synchronized void completeEdit(Editor editor, boolean success) throws IOException {
    Entry entry = editor.entry;
    // 校验状态
    if (entry.currentEditor != editor) {
      throw new IllegalStateException();
    }

    // 如果是首次创建，那么每个索引必须有值
    if (success && !entry.readable) {
      for (int i = 0; i < valueCount; i++) {
        if (!editor.written[i]) {
          editor.abort();
          throw new IllegalStateException("Newly created entry didn't create value for index " + i);
        }
        if (!entry.getDirtyFile(i).exists()) {
          editor.abort();
          return;
        }
      }
    }

    // 操作文件，成功就转为正式数据并更新一些参数，否则就删除文件
    for (int i = 0; i < valueCount; i++) {
      File dirty = entry.getDirtyFile(i);
      if (success) {
        if (dirty.exists()) {
          File clean = entry.getCleanFile(i);
          dirty.renameTo(clean);
          long oldLength = entry.lengths[i];
          long newLength = clean.length();
          entry.lengths[i] = newLength;
          size = size - oldLength + newLength;
        }
      } else {
        deleteIfExists(dirty);
      }
    }

    // 一些收尾工作，写入日志
    redundantOpCount++;
    entry.currentEditor = null;
    if (entry.readable | success) {
      entry.readable = true;
      journalWriter.write(CLEAN + ' ' + entry.key + entry.getLengths() + '\n');
      if (success) {
        entry.sequenceNumber = nextSequenceNumber++;
      }
    } else {
      lruEntries.remove(entry.key);
      journalWriter.write(REMOVE + ' ' + entry.key + '\n');
    }
    journalWriter.flush();

    // 超过限定大小要重构一下
    if (size > maxSize || journalRebuildRequired()) {
      executorService.submit(cleanupCallable);
    }
  }

一开始 if (success && !entry.readable) 判断是否成功且是新创建的，若是编辑已存在的文件那么 readable 应该是 true. 这里面要求每一个索引都必须有数据。

接着进行文件操作。如果成功就把临时文件重命名为正式缓存数据，然后更新一下数据项大小和总大小。如果失败调用 deleteIfExists，其实就是删除临时文件。

最后进行收尾工作，成功的话设置下 readable 表示此缓存已就绪，写入一个 CLEAN 日志。否则删除这一数据项并写入一个 REMOVE 日志。然后判断一下是否需要重构，得满足任意一个条件就行：① 缓存大小超过限制。②日志条目超过阈值并且大于现有的缓存项数。这是很好理解的，即便日志数很多，但如果缓存项也很多那么这些日志就是必要的，不能删除。

除了 commit 外还有一个 abort 函数用于放弃此次编辑：

    /**
     * Aborts this edit. This releases the edit lock so another edit may be
     * started on the same key.
     */
    public void abort() throws IOException {
      completeEdit(this, false);
    }

同样是调用 completeEdit 只不过直接指明编辑失败。

读取缓存

读取缓存是调用 cache.get(kay) 得到一个 Snapshot 对象，那就先来看看这个 get 方法：

/**
   * 根据 key 返回一个快照对象，如果 key 不存在或不可读则返回 null. 
   *  一旦成功返回那么此数据项将移动到 lru 队列最前端。
   */
  public synchronized Snapshot get(String key) throws IOException {
    // 一些校验
    checkNotClosed();
    validateKey(key);
    Entry entry = lruEntries.get(key);
    if (entry == null) {
      return null;
    }
    if (!entry.readable) {
      return null;
    }

    // 直接打开输入流来达到“快照”的效果。
    // 如果返回文件那么有可能在读取前被更改。
    InputStream[] ins = new InputStream[valueCount];
    try {
      for (int i = 0; i < valueCount; i++) {
        ins[i] = new FileInputStream(entry.getCleanFile(i));
      }
    } catch (FileNotFoundException e) {
      // 文件被异常删除了
      for (int i = 0; i < valueCount; i++) {
        if (ins[i] != null) {
          Util.closeQuietly(ins[i]);
        } else {
          break;
        }
      }
      return null;
    }

    // 写入日志，根据需求重构
    redundantOpCount++;
    journalWriter.append(READ + ' ' + key + '\n');
    if (journalRebuildRequired()) {
      executorService.submit(cleanupCallable);
    }
    // 返回快照对象
    return new Snapshot(key, entry.sequenceNumber, ins, entry.lengths);
  }

读取相对来说简单多了，分为三步：检验、读取、写日志。和一开始的猜测一样，在读取过程中直接打开了一个输入流以免后续编辑的影响。

总结（LRU实现）

虽然已经分析完了主要方法，但似乎没有看到 LRU 的实现啊？其实核心的东西就是一个变量：

  private final LinkedHashMap<String, Entry> lruEntries =
      new LinkedHashMap<String, Entry>(0, 0.75f, true);

LinkedHashMap 底层实现是一个双向链表，因此可以维护顺序。构造函数第三个参数是 accessOrder 默认为 false，这里非常少见地设置为 true 了。效果就是每次取元素时，被取到的元素就会自动移动到尾部，从而实现了 LRU，即：越靠近尾部的元素就是近期被访问过的。因此当大小超出阈值时，只需要从头部开始依次删除。

举个栗子🌰：我们先依次插入 [1, 5] 这五个数据项，那么此时的链表就是：

1, 2, 3, 4, 5

接着依次读取了[1, 4, 1]，链表变为：

2, 3, 4, 5, 1 →
2, 3, 5, 1, 4 →
2, 3, 5, 4, 1

此时插入一个比较大的数据项，总空间达到阈值，那么只需从 2 开始依次删除就行了。

最后来看一下负责维持空间大小的 trimToSize 函数：

  private void trimToSize() throws IOException {
    while (size > maxSize) {
      Map.Entry<String, Entry> toEvict = lruEntries.entrySet().iterator().next();
      remove(toEvict.getKey());
    }
  }

非常简短，与刚才分析的一样，从 lruEntries 头部开始依次删除，直到大小符合要求。

说到底 DiskLruCache 这个库并没有在 LRU 上下太大功夫，核心算法由 LinkedHashMap 实现了。而我们需要做的更多的是维护磁盘缓存的一致性，避免文件泄露。仔细思考下 journal 文件的定义还是有不少收获。之前习惯性地用成熟的数据存储格式，比如 json, xml，甚至使用 sqlite 数据库，毫无疑问数据库和 xml 太重了在用在这里不合适。而 json 定义严格，容错率低，哪怕有一个标点错误都很难简单地恢复尽可能多的记录。DiskLruCache 仅仅使用5个状态关键字就轻松解决了这一问题，大道至简啊。

写代码的日子

android

License: 禁止转载到非自托管的内容平台，禁止用于 AI 训练