Kember Identity Search

The Kember Identity Hash

Elliot Kember started a quest to find a string that is it's own MD5 hash: The Kember Identity Hash.

There are 16³² (340,282,366,920,938,463,463,374,607,431,768,211,456) potential results for a hex-encoded MD5 hash. For example:

md5('abc') = '900150983cd24fb0d6963f7d28e17f72'
md5('def') = '4ed9407630eb1000c0f6b63842defa7d'

Searching through all the potential hashes is a huge problem because of the gamut. Of course, the size of that gamut is exactly the point of a hash: the number of collisions (two inputs with the same hash) is virtually zero.

Unfortunately, he oversimplified and left one ambiguity regarding case. However, based on his examples, I've assumed he's referring to an all-lowercase string.

My Progress

So far I've checked about 28,571,075,396,407 hashes to see if they match their source. That's around 0% coverage.

To get a handle on how tiny a percentage this is, visualize a progress bar for the search. It should be about 32,025,260 light years wide. The Milky Way, just for reference, is only around 100,000 light years wide, and our nearest neighbor galaxy (Andromeda) is a mere 2,500,000 light years away.

On this progress bar, the portion that I've completed would be one inch wide.

32,025,260 light years

done

1 inch

Not to Scale!

As you can see, the odds of success are fairly low. And according to analysis by ograll, the odds of a match existing at all is around 67%.

The Searchers

Part of his quest was soliciting searchers from others. I wrote one in Groovy, using the JRE-supplied MessageDigest implementation. ~~It now runs on my office workstation when I'm not using it (running 32-bit Windows XP).~~

After having inexplicable performance issues with threading in my initial searcher, I decided to rewrite the searcher in Java to see if it was a Groovy-specific issue. The Java version is slightly less than four times faster at searching, but the threading issue remains. This now runs on my office workstation when I'm not using it (now running 64-bit Java on CentOS 5).

Both version use the same algorithm: make a 32-character string of randomized hex digits, take it's MD5 hash, hex-encode it, and see if it matches the original. This requires 10-15 lines of code, and the other few hundred lines in each searcher is for scheduling, monitoring, and threading.

My Java Searcher

import java.io.IOException;
import java.io.InputStreamReader;
import java.math.BigInteger;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.text.NumberFormat;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.ListIterator;
import java.util.Random;
import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;

/*
 * Don't look at this as an example of how to do things right.  It's a
 * Groovy script I ported, twisted, and just generally mangled in terms
 * of structure.
 */
public final class Search {
  
  static final String logPrefix = "[" + (int)(Math.random() * 1000000000) + "] ";


  
  public static void main(String[] args) {
    int initialThreads = 1;
    try {
      if (args.length > 0) {
        initialThreads = Integer.parseInt(args[0]);
      }
    } catch (NumberFormatException e) {
      // oh well
    }
    new Search().search(initialThreads);
  }

  public static void err(String msg) {
    emit(logPrefix + msg, "err");
  }

  public static void out(String msg) {
    emit(logPrefix + msg, "out");
  }
  
  public static void emit(String msg, String stream) {
    if (stream == "err") {
      System.err.print(logPrefix);
      System.err.println(msg);
    } else {
      System.out.print(logPrefix);
      System.out.println(msg);
    }
  }
  
  

  long start = System.currentTimeMillis();
  long lastSendToMaster = start;
  final AtomicLong totalHashesChecked = new AtomicLong();
  final AtomicLong totalHashesSent = new AtomicLong();
  final List<String> kemberIdentities = new ArrayList<String>();
  final List<SearcherThread> searcherThreads = new ArrayList<SearcherThread>();
  final NumberFormat nFormat = NumberFormat.getIntegerInstance();
  boolean keepSearching = true;
  final StatusDisplayer statusDisplayer = new StatusDisplayer();
  final MasterConnection masterConnection = new MasterConnection();
  final Timer timer = new Timer(true);
  
  
  
  class SearchRecorder {
    public void recordChecks(long count) {
      totalHashesChecked.addAndGet(count);
    }
    
    public void recordMatch(String hash) {
      out("");
      out("identity hash: " + hash);
      out("");
      synchronized (kemberIdentities) {
        kemberIdentities.add(hash);
      }
      masterConnection.sendCheckUpdate();
      masterConnection.sendIdentityUpdate();
    }
  }
  
  
  
  class StatusDisplayer {
    public void displayStatus() {
      cleanUpThreads();
      if (searcherThreads.size() == 0) {
        return; // nothing to do
      }
      long elapsed = (System.currentTimeMillis() - start) / 1000;
      String timing;
      if (elapsed < 60) { // less than a minute
        timing = elapsed + " sec";
      } else if (elapsed < 60 * 60) { // less than a hour
        timing = (Math.floor(elapsed / 60.0 * 100) / 100) + " min";
      } else if (elapsed < 60 * 60 * 24) { // less than a day
        timing = (Math.floor(elapsed / 60.0 / 60 * 100) / 100) + " hrs";
      } else {
        timing = (Math.floor(elapsed / 60.0 / 60 / 24 * 100) / 100) + " days";
      }
      double minutes = (System.currentTimeMillis() - lastSendToMaster) / 60000.0;
      double rate = (totalHashesChecked.get() - totalHashesSent.get()) / minutes;
      out(
        nFormat.format(totalHashesChecked.get()) + " checks in "
        + timing + " [threads: " + searcherThreads.size() + "] "
        + "[checks/min: " + nFormat.format(rate) + "]"
      );
    }
  }
  

  
  class MasterConnection {
    
    public void sendCheckUpdate() {
      sendCheckUpdate("err");
    }
      
    public void sendCheckUpdate(String errorStream) {
      synchronized (totalHashesSent) {
        long diff = totalHashesChecked.get() - totalHashesSent.get();
        if (diff != 0) {
          out("sending checks to master");
          try {
            makeRequest("count", "" + diff);
            totalHashesSent.addAndGet(diff);
            lastSendToMaster = System.currentTimeMillis();
            out("done sending checks (" + nFormat.format(diff) + ")");
          } catch (Exception e) {
            emit("error (" + e.getMessage() + ") (" + nFormat.format(diff) + ")]", errorStream);
          }
        }
      }
    }
    
    public void sendIdentityUpdate() {
      String hash;
      boolean hasIdentity;
      synchronized (kemberIdentities) {
        Iterator<String> itr = kemberIdentities.iterator();
        while (itr.hasNext()) {
          hash = itr.next();
          out("sending identity '" + hash + "' to master...");
          try {
            makeRequest("identity", hash);
            out("done sending identity " + hash);
          } catch (Exception e) {
            err("error (" + e.getMessage() + ") (" + hash + ")]");
            return; // we don't want to kill anything if we didn't send it successfully
          }
        }
        hasIdentity = kemberIdentities.size() > 0;
      }
      if (hasIdentity) {
        killAllSearchers();
        cleanUpThreads();
      }
    }
    
    
    
    private void makeRequest(String name, String value) {
      try {
        URLConnection conn = new URL(
          "http://www.barneyb.com/r/hash_identity.cfm"
          + "?psk=OMITTED"
          + "&" + name + "=" + value
          + "&r=" + Math.random()
        ).openConnection();
        conn.getContent();
        if (conn instanceof HttpURLConnection) {
          ((HttpURLConnection) conn).disconnect();
        }  
      } catch (Exception e) {
        throw new RuntimeException(e);
      }
    }
    
  }
  
  
  
  public void search(int initialThreads) {
    // register the shutdown hook
    Runtime.getRuntime().addShutdownHook(new Thread() {
      @Override
      public void run() {
        // this ensures that all the data cached locally is flushed
        // to the master before the VM terminates
        masterConnection.sendIdentityUpdate();
        masterConnection.sendCheckUpdate();
      }
    });

    // start the background updaters
    timer.scheduleAtFixedRate(new TimerTask() {
      @Override
      public void run() {
            statusDisplayer.displayStatus();
      }
    }, 15 * 1000, 15 * 1000);
    timer.schedule(new TimerTask() {
      @Override
      public void run() {
        // for the scheduled updates, just log failures to STDOUT
        masterConnection.sendCheckUpdate("out");
      }
    }, 0, 5 * 60 * 1000);

    try {
      // spin up the initial threads
      for (int i = 0; i < initialThreads; i++) {
        addSearcher();
      }
      
      // start the monitor
      InputStreamReader isr = new InputStreamReader(System.in);
      while (keepSearching) {
        try {
          while (isr.ready()) {
            char c = (char) isr.read();
            if (c == 'q') {
              killAllSearchers();
            } else if (c == '+') {
              addSearcher();
            } else if (c == '-') {
              killSearcher();
            }
          }
        } catch (IOException ioe) {
          // whatever
        }
        cleanUpThreads();
        try {
          Thread.sleep(500);
        } catch (InterruptedException ie) {
          // whatever
        }
      }
    } finally {
      statusDisplayer.displayStatus();
      masterConnection.sendCheckUpdate();
    }
  }
  
  
  
  private void cleanUpThreads() {
    try {
      Thread.sleep(250); // let them have a chance to exit if needed
    } catch (InterruptedException ie) {
      // whatever
    }
    ListIterator<SearcherThread> litr = searcherThreads.listIterator();
    SearcherThread st;
    while (litr.hasNext()) {
      st = litr.next();
      if (! st.isRunning()) {
        litr.remove();
      }
    }
    if (searcherThreads.size() == 0) {
      try {
        Thread.sleep(1000);
      } catch (InterruptedException ie) {
        // whatever
      }
      keepSearching = false;
    }
  }
  
  
  
  private void addSearcher() {
    SearcherThread st;
    st = new SearcherThread(new SearchRecorder());
    Thread t = new Thread(st, st.getName());
    t.setDaemon(true);
        t.setPriority(Thread.MIN_PRIORITY);
    t.start();
    out("started " + st.getName());
    searcherThreads.add(st);
  }
  
  
  
  private void killAllSearchers() {
    killSearchers(Integer.MAX_VALUE);
  }
  
  
  
  private void killSearcher() {
    killSearchers(1);
  }
  
  
  
  private void killSearchers(int count) {
    int killCount = 0;

    SearcherThread st;
    Iterator<SearcherThread> itr = searcherThreads.iterator();
    while (itr.hasNext()) {
      st = itr.next();
      if (st.isRunning() && ! st.isStopRequested()) {
        st.requestStop();
        out("requested termination of " + st.getName());
        killCount += 1;
        if (killCount >= count) {
          break; // we've done enough
        }
      }
    }
  }

}



final class SearcherThread implements Runnable {
  static AtomicInteger instanceCount = new AtomicInteger();

  private boolean stopRequested = false;
  private boolean running = false;
  private String digits = "0123456789abcdef";
  private int len = digits.length();
  private Random random = new Random();
  private MessageDigest digest;
  private String name;
  private Search.SearchRecorder searchRecorder;
  
  public SearcherThread(Search.SearchRecorder searchRecorder) {
    this.searchRecorder = searchRecorder;
    name = "searcherThread-" + instanceCount.getAndIncrement();
    try {
      digest = MessageDigest.getInstance("MD5");
    } catch (NoSuchAlgorithmException nsae) {
      throw new RuntimeException(nsae);
    }
  }
  
  public boolean isStopRequested() {
    return stopRequested;
  }

  public String getName() {
    return name;
  }
  
  public void requestStop() {
    stopRequested = true;
  }
  
  public boolean isRunning() {
    return running;
  }
  
  public void run() {
    running = true;
    try {
      char[] chars = new char[32];
      String s, hash;
      // this should yield a cycle every couple seconds
      int i, sub, subcount = 250000 + random.nextInt(100000);

      while (true) {
        for (sub = subcount; sub != 0; sub--) {
          // build a candidate string
          for (i = 0; i < chars.length; i++) {
            chars[i] = digits.charAt(random.nextInt(len));
          }
          s = new String(chars);
          // hash it
          digest.reset();
          hash = new BigInteger(1, digest.digest(s.getBytes())).toString(16);
          while (hash.length() < 32) {
            hash = "0" + hash;
          }
          // check for equality
          if (s.equals(hash)) {
            Search.out("");
            Search.out("identity hash: " + s);
            Search.out("");
            searchRecorder.recordMatch(s);
            System.exit(0);
          }
        }
        searchRecorder.recordChecks(subcount);
        if (stopRequested) {
          break;
        }
      }
    } finally {
      Search.out(getName() + " exited");
      running = false;
    }
  }
}

My Groovy Searcher

import java.security.MessageDigest
import java.text.NumberFormat
import static Math.floor

NumberFormat nFormat = NumberFormat.integerInstance
Object semaphore = new Object()
long totalHashesChecked = 0
long totalHashesSent = 0
long start = System.currentTimeMillis()
long lastSendToMaster = start
String digits = "0123456789abcdef"
int len = digits.length()
List searcherThreads = []

def sendUpdate = { sendToMaster = true ->
  long elapsed = (System.currentTimeMillis() - start) / 1000
  String timing
  if (elapsed < 60) { // less than a minute
    timing = "$elapsed sec"
  } else if (elapsed < 60 * 60) { // less than a hour
    timing = "${floor(elapsed / 60 * 100) / 100} min"
  } else if (elapsed < 60 * 60 * 24) { // less than a day
    timing = "${floor(elapsed / 60 / 60 * 100) / 100} hrs"
  } else {
    timing = "${floor(elapsed / 60 / 60 / 24 * 100) / 100} days"
  }
  def minSinceLastSend = (System.currentTimeMillis() - lastSendToMaster) / 1000 / 60
  def rate = (totalHashesChecked - totalHashesSent) / minSinceLastSend
  println(
    "${nFormat.format(totalHashesChecked)} checks in "
    + "$timing [threads: ${searcherThreads.size()}] "
    + "[checks/min: ${nFormat.format(rate)}]"
  )
  if (sendToMaster) {
    print("[sending to master...")
    synchronized (semaphore) {
      def diff = totalHashesChecked - totalHashesSent
      if (diff != 0) {
        try {
          new URL(
            "http://www.barneyb.com/r/hash_identity.cfm"
            + "?psk=OMITTED"
            + "&count=" + diff
            + "&r=" + Math.random()
          ).getText()
          totalHashesSent += diff
          lastSendToMaster = System.currentTimeMillis()
          println("done]")
        } catch (Exception e) {
          println("error ($diff)]")
        }
      }
    }
  }
}

def foundIt = { s ->
  println("$s hashes to itself!")
  sendUpdate()
  try {
    new URL(
      "http://www.barneyb.com/r/hash_identity.cfm"
      + "?psk=OMITTED"
      + "&identity=" + s
      + "&r=" + Math.random()
    ).getText()
  } catch (Exception e) {
    println("error sending identity ($s) to master")
  }
  System.exit(0)
}

def searcher = { params ->
  Random random = new Random()
  def digest = MessageDigest.getInstance("MD5")
  char[] chars = new char[32]
  String s, hash
  int i, sub, subcount = 5000 + random.nextInt(5000)
  
  println("started searcher thread (${params.id})...")
  while (true) {
    for (sub = subcount; sub > 0; sub--) {
      // build a candidate string
      for (i = 0; i < chars.length; i++) {
        chars[i] = digits.charAt(random.nextInt(len))
      }
      s = new String(chars)
      // hash it
      digest.reset()
      digest.update(s.bytes)
      hash = new BigInteger(1, digest.digest()).toString(16)
      while (hash.length() < 32) {
        hash = "0" + hash
      }
      // check for equality
      if (s.equals(hash)) {
        println(s)
        println()
        synchronized (semaphore) {
          // this will double-count, but it's well w/in my
          // margin of error, so not gonna worry about it
          totalHashesChecked += subcount - sub + 1
        }
        foundIt(s)
        break
      }
    }
    synchronized (semaphore) {
      totalHashesChecked += subcount
    }
    if (params.stopFlag) {
      println("searcher thread (${params.id}) exiting...")
      break
    }
  }
}

def addSearcher = {
  def item = [map: [stopFlag: false, id: "search-" + searcherThreads.size()]]
  item.closure = searcher.curry(item.map)
  searcherThreads.push(item)
  Thread.startDaemon(item.closure)
}
def killSearcher = { force = false ->
  if (!force && searcherThreads.size() == 1) {
    println("cowardly refusing to kill last searcher.  Use 'q' to quit")
    return
  }
  searcherThreads.pop().map.stopFlag = true
}
def killAllSearchers = {
  while (searcherThreads.size() > 0) {
    killSearcher(true)
  }
}

volatile def updateThread
updateThread = Thread.startDaemon {
    println("started background monitor...")
  for (int i = 1; true; i++) {
        Thread.sleep(5000)
        if (updateThread == null) {
          break
        }
        sendUpdate(i % 12 == 0) // every minute
    }
}

def initialThreads = 1
try {
  if (args.length > 0) {
    initialThreads = Integer.parseInt(args[0])
  }
} catch (NumberFormatException e) {
  // oh well
}
// spin up the threads
initialThreads.times {
  addSearcher()
}

def isr = new InputStreamReader(System.in)
while (true) {
  char c = isr.read();
  if (c == 'q') {
    killAllSearchers()
  } else if (c == '+') {
    addSearcher()
  } else if (c == '-') {
    killSearcher()
  }
  if (searcherThreads.size() == 0) {
    updateThread = null
    println("exiting...")
    Thread.sleep(1000)
    sendUpdate()
    break
  }
}

The Ambiguity

In a nutshell, the ambiguity is whether this would be an identity hash (same letters, different case) for a mythical bb3() hash function that is only three characters long:

bb3('5Ae') = '5ae'

The MD5 hash algorithm (like all other message digests) operates on bytes, not character strings. However, humans can deal with strings far better than raw bytes, so MD5s are usually hex encoded for human consumption. "Hex", by the way, referse to hexadecimal notation, or base-16 numbers. Since we only have 10 numeric digits, hex uses the letters a-f (or A-F) to represent the digits from 10-15.

The problem arises because the letters representing the hex digits don't have any case semantics. For example, both "4ed9..." and "4ED9..." are valid representations for md5('def'), because in this case the letters characters aren't letters, they're digits.

Most systems seem to hex-encode with lower case letters, but not all. I've not personally run across any that use a mixture of lower and upper case letters, but it would not be incorrect to do so (just sorta weird).

The input string (the string being hashed), however, is case sensitive because it's not a list of digits, it's a collection of bytes. At the byte level, upper and lower letters are stored differently. For example:

md5('def') = '4ed9407630eb1000c0f6b63842defa7d'
md5('DEF') = '822dd494b3e14a82aa76bd455e6b6f4b'

So the question is whether or not the hex-encoded has can be recased to create a match. For example, consider this result (using a made-up hash function called bb3()):

bb3('5Ae') = '5ae' (or '5Ae' or '5AE')
bb3('5ae') = 'ae7' (or 'aE7' or 'AE7')

What's interesting about the first example is that the hash value has the same letters as the input string, just in a different case. If we change the case of the input string, we'll get a different hash (the second example). However, if we change the case of the output string we won't change the meaning of the result, and we will have lexicographically equal strings (the Kember Identity Hash).

Succinctly stated, the ambiguity is whether mixed-case hashes can be used in the identity. I've made the assumption that all hashes must be lowercase, but Elliot hasn't made a statement one way or the other.