Elliot Kember started a quest to find a string that is it's own MD5 hash: The Kember Identity Hash.
There are 1632
(340,282,366,920,938,463,463,374,607,431,768,211,456
) potential results for a hex-encoded MD5 hash. For example:
md5('abc') = '900150983cd24fb0d6963f7d28e17f72' md5('def') = '4ed9407630eb1000c0f6b63842defa7d'
Searching through all the potential hashes is a huge problem because of the gamut. Of course, the size of that gamut is exactly the point of a hash: the number of collisions (two inputs with the same hash) is virtually zero.
Unfortunately, he oversimplified and left one ambiguity regarding case. However, based on his examples, I've assumed he's referring to an all-lowercase string.
So far I've checked about 28,571,075,396,407
hashes to see if they match their source. That's around 0%
coverage.
To get a handle on how tiny a percentage this is, visualize a progress bar for the search. It should be about 32,025,260
light years wide. The Milky Way, just for reference, is only around 100,000
light years wide, and our nearest neighbor galaxy (Andromeda) is a mere 2,500,000
light years away.
On this progress bar, the portion that I've completed would be one inch wide.
As you can see, the odds of success are fairly low. And according to analysis by ograll, the odds of a match existing at all is around 67%.
Part of his quest was soliciting searchers from others. I wrote one in Groovy, using the JRE-supplied MessageDigest implementation. It now runs on my office workstation when I'm not using it (running 32-bit Windows XP).
After having inexplicable performance issues with threading in my initial searcher, I decided to rewrite the searcher in Java to see if it was a Groovy-specific issue. The Java version is slightly less than four times faster at searching, but the threading issue remains. This now runs on my office workstation when I'm not using it (now running 64-bit Java on CentOS 5).
Both version use the same algorithm: make a 32-character string of randomized hex digits, take it's MD5 hash, hex-encode it, and see if it matches the original. This requires 10-15 lines of code, and the other few hundred lines in each searcher is for scheduling, monitoring, and threading.
import java.io.IOException; import java.io.InputStreamReader; import java.math.BigInteger; import java.net.HttpURLConnection; import java.net.URL; import java.net.URLConnection; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; import java.text.NumberFormat; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import java.util.ListIterator; import java.util.Random; import java.util.Timer; import java.util.TimerTask; import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicLong; /* * Don't look at this as an example of how to do things right. It's a * Groovy script I ported, twisted, and just generally mangled in terms * of structure. */ public final class Search { static final String logPrefix = "[" + (int)(Math.random() * 1000000000) + "] "; public static void main(String[] args) { int initialThreads = 1; try { if (args.length > 0) { initialThreads = Integer.parseInt(args[0]); } } catch (NumberFormatException e) { // oh well } new Search().search(initialThreads); } public static void err(String msg) { emit(logPrefix + msg, "err"); } public static void out(String msg) { emit(logPrefix + msg, "out"); } public static void emit(String msg, String stream) { if (stream == "err") { System.err.print(logPrefix); System.err.println(msg); } else { System.out.print(logPrefix); System.out.println(msg); } } long start = System.currentTimeMillis(); long lastSendToMaster = start; final AtomicLong totalHashesChecked = new AtomicLong(); final AtomicLong totalHashesSent = new AtomicLong(); final List<String> kemberIdentities = new ArrayList<String>(); final List<SearcherThread> searcherThreads = new ArrayList<SearcherThread>(); final NumberFormat nFormat = NumberFormat.getIntegerInstance(); boolean keepSearching = true; final StatusDisplayer statusDisplayer = new StatusDisplayer(); final MasterConnection masterConnection = new MasterConnection(); final Timer timer = new Timer(true); class SearchRecorder { public void recordChecks(long count) { totalHashesChecked.addAndGet(count); } public void recordMatch(String hash) { out(""); out("identity hash: " + hash); out(""); synchronized (kemberIdentities) { kemberIdentities.add(hash); } masterConnection.sendCheckUpdate(); masterConnection.sendIdentityUpdate(); } } class StatusDisplayer { public void displayStatus() { cleanUpThreads(); if (searcherThreads.size() == 0) { return; // nothing to do } long elapsed = (System.currentTimeMillis() - start) / 1000; String timing; if (elapsed < 60) { // less than a minute timing = elapsed + " sec"; } else if (elapsed < 60 * 60) { // less than a hour timing = (Math.floor(elapsed / 60.0 * 100) / 100) + " min"; } else if (elapsed < 60 * 60 * 24) { // less than a day timing = (Math.floor(elapsed / 60.0 / 60 * 100) / 100) + " hrs"; } else { timing = (Math.floor(elapsed / 60.0 / 60 / 24 * 100) / 100) + " days"; } double minutes = (System.currentTimeMillis() - lastSendToMaster) / 60000.0; double rate = (totalHashesChecked.get() - totalHashesSent.get()) / minutes; out( nFormat.format(totalHashesChecked.get()) + " checks in " + timing + " [threads: " + searcherThreads.size() + "] " + "[checks/min: " + nFormat.format(rate) + "]" ); } } class MasterConnection { public void sendCheckUpdate() { sendCheckUpdate("err"); } public void sendCheckUpdate(String errorStream) { synchronized (totalHashesSent) { long diff = totalHashesChecked.get() - totalHashesSent.get(); if (diff != 0) { out("sending checks to master"); try { makeRequest("count", "" + diff); totalHashesSent.addAndGet(diff); lastSendToMaster = System.currentTimeMillis(); out("done sending checks (" + nFormat.format(diff) + ")"); } catch (Exception e) { emit("error (" + e.getMessage() + ") (" + nFormat.format(diff) + ")]", errorStream); } } } } public void sendIdentityUpdate() { String hash; boolean hasIdentity; synchronized (kemberIdentities) { Iterator<String> itr = kemberIdentities.iterator(); while (itr.hasNext()) { hash = itr.next(); out("sending identity '" + hash + "' to master..."); try { makeRequest("identity", hash); out("done sending identity " + hash); } catch (Exception e) { err("error (" + e.getMessage() + ") (" + hash + ")]"); return; // we don't want to kill anything if we didn't send it successfully } } hasIdentity = kemberIdentities.size() > 0; } if (hasIdentity) { killAllSearchers(); cleanUpThreads(); } } private void makeRequest(String name, String value) { try { URLConnection conn = new URL( "http://www.barneyb.com/r/hash_identity.cfm" + "?psk=OMITTED" + "&" + name + "=" + value + "&r=" + Math.random() ).openConnection(); conn.getContent(); if (conn instanceof HttpURLConnection) { ((HttpURLConnection) conn).disconnect(); } } catch (Exception e) { throw new RuntimeException(e); } } } public void search(int initialThreads) { // register the shutdown hook Runtime.getRuntime().addShutdownHook(new Thread() { @Override public void run() { // this ensures that all the data cached locally is flushed // to the master before the VM terminates masterConnection.sendIdentityUpdate(); masterConnection.sendCheckUpdate(); } }); // start the background updaters timer.scheduleAtFixedRate(new TimerTask() { @Override public void run() { statusDisplayer.displayStatus(); } }, 15 * 1000, 15 * 1000); timer.schedule(new TimerTask() { @Override public void run() { // for the scheduled updates, just log failures to STDOUT masterConnection.sendCheckUpdate("out"); } }, 0, 5 * 60 * 1000); try { // spin up the initial threads for (int i = 0; i < initialThreads; i++) { addSearcher(); } // start the monitor InputStreamReader isr = new InputStreamReader(System.in); while (keepSearching) { try { while (isr.ready()) { char c = (char) isr.read(); if (c == 'q') { killAllSearchers(); } else if (c == '+') { addSearcher(); } else if (c == '-') { killSearcher(); } } } catch (IOException ioe) { // whatever } cleanUpThreads(); try { Thread.sleep(500); } catch (InterruptedException ie) { // whatever } } } finally { statusDisplayer.displayStatus(); masterConnection.sendCheckUpdate(); } } private void cleanUpThreads() { try { Thread.sleep(250); // let them have a chance to exit if needed } catch (InterruptedException ie) { // whatever } ListIterator<SearcherThread> litr = searcherThreads.listIterator(); SearcherThread st; while (litr.hasNext()) { st = litr.next(); if (! st.isRunning()) { litr.remove(); } } if (searcherThreads.size() == 0) { try { Thread.sleep(1000); } catch (InterruptedException ie) { // whatever } keepSearching = false; } } private void addSearcher() { SearcherThread st; st = new SearcherThread(new SearchRecorder()); Thread t = new Thread(st, st.getName()); t.setDaemon(true); t.setPriority(Thread.MIN_PRIORITY); t.start(); out("started " + st.getName()); searcherThreads.add(st); } private void killAllSearchers() { killSearchers(Integer.MAX_VALUE); } private void killSearcher() { killSearchers(1); } private void killSearchers(int count) { int killCount = 0; SearcherThread st; Iterator<SearcherThread> itr = searcherThreads.iterator(); while (itr.hasNext()) { st = itr.next(); if (st.isRunning() && ! st.isStopRequested()) { st.requestStop(); out("requested termination of " + st.getName()); killCount += 1; if (killCount >= count) { break; // we've done enough } } } } } final class SearcherThread implements Runnable { static AtomicInteger instanceCount = new AtomicInteger(); private boolean stopRequested = false; private boolean running = false; private String digits = "0123456789abcdef"; private int len = digits.length(); private Random random = new Random(); private MessageDigest digest; private String name; private Search.SearchRecorder searchRecorder; public SearcherThread(Search.SearchRecorder searchRecorder) { this.searchRecorder = searchRecorder; name = "searcherThread-" + instanceCount.getAndIncrement(); try { digest = MessageDigest.getInstance("MD5"); } catch (NoSuchAlgorithmException nsae) { throw new RuntimeException(nsae); } } public boolean isStopRequested() { return stopRequested; } public String getName() { return name; } public void requestStop() { stopRequested = true; } public boolean isRunning() { return running; } public void run() { running = true; try { char[] chars = new char[32]; String s, hash; // this should yield a cycle every couple seconds int i, sub, subcount = 250000 + random.nextInt(100000); while (true) { for (sub = subcount; sub != 0; sub--) { // build a candidate string for (i = 0; i < chars.length; i++) { chars[i] = digits.charAt(random.nextInt(len)); } s = new String(chars); // hash it digest.reset(); hash = new BigInteger(1, digest.digest(s.getBytes())).toString(16); while (hash.length() < 32) { hash = "0" + hash; } // check for equality if (s.equals(hash)) { Search.out(""); Search.out("identity hash: " + s); Search.out(""); searchRecorder.recordMatch(s); System.exit(0); } } searchRecorder.recordChecks(subcount); if (stopRequested) { break; } } } finally { Search.out(getName() + " exited"); running = false; } } }
import java.security.MessageDigest import java.text.NumberFormat import static Math.floor NumberFormat nFormat = NumberFormat.integerInstance Object semaphore = new Object() long totalHashesChecked = 0 long totalHashesSent = 0 long start = System.currentTimeMillis() long lastSendToMaster = start String digits = "0123456789abcdef" int len = digits.length() List searcherThreads = [] def sendUpdate = { sendToMaster = true -> long elapsed = (System.currentTimeMillis() - start) / 1000 String timing if (elapsed < 60) { // less than a minute timing = "$elapsed sec" } else if (elapsed < 60 * 60) { // less than a hour timing = "${floor(elapsed / 60 * 100) / 100} min" } else if (elapsed < 60 * 60 * 24) { // less than a day timing = "${floor(elapsed / 60 / 60 * 100) / 100} hrs" } else { timing = "${floor(elapsed / 60 / 60 / 24 * 100) / 100} days" } def minSinceLastSend = (System.currentTimeMillis() - lastSendToMaster) / 1000 / 60 def rate = (totalHashesChecked - totalHashesSent) / minSinceLastSend println( "${nFormat.format(totalHashesChecked)} checks in " + "$timing [threads: ${searcherThreads.size()}] " + "[checks/min: ${nFormat.format(rate)}]" ) if (sendToMaster) { print("[sending to master...") synchronized (semaphore) { def diff = totalHashesChecked - totalHashesSent if (diff != 0) { try { new URL( "http://www.barneyb.com/r/hash_identity.cfm" + "?psk=OMITTED" + "&count=" + diff + "&r=" + Math.random() ).getText() totalHashesSent += diff lastSendToMaster = System.currentTimeMillis() println("done]") } catch (Exception e) { println("error ($diff)]") } } } } } def foundIt = { s -> println("$s hashes to itself!") sendUpdate() try { new URL( "http://www.barneyb.com/r/hash_identity.cfm" + "?psk=OMITTED" + "&identity=" + s + "&r=" + Math.random() ).getText() } catch (Exception e) { println("error sending identity ($s) to master") } System.exit(0) } def searcher = { params -> Random random = new Random() def digest = MessageDigest.getInstance("MD5") char[] chars = new char[32] String s, hash int i, sub, subcount = 5000 + random.nextInt(5000) println("started searcher thread (${params.id})...") while (true) { for (sub = subcount; sub > 0; sub--) { // build a candidate string for (i = 0; i < chars.length; i++) { chars[i] = digits.charAt(random.nextInt(len)) } s = new String(chars) // hash it digest.reset() digest.update(s.bytes) hash = new BigInteger(1, digest.digest()).toString(16) while (hash.length() < 32) { hash = "0" + hash } // check for equality if (s.equals(hash)) { println(s) println() synchronized (semaphore) { // this will double-count, but it's well w/in my // margin of error, so not gonna worry about it totalHashesChecked += subcount - sub + 1 } foundIt(s) break } } synchronized (semaphore) { totalHashesChecked += subcount } if (params.stopFlag) { println("searcher thread (${params.id}) exiting...") break } } } def addSearcher = { def item = [map: [stopFlag: false, id: "search-" + searcherThreads.size()]] item.closure = searcher.curry(item.map) searcherThreads.push(item) Thread.startDaemon(item.closure) } def killSearcher = { force = false -> if (!force && searcherThreads.size() == 1) { println("cowardly refusing to kill last searcher. Use 'q' to quit") return } searcherThreads.pop().map.stopFlag = true } def killAllSearchers = { while (searcherThreads.size() > 0) { killSearcher(true) } } volatile def updateThread updateThread = Thread.startDaemon { println("started background monitor...") for (int i = 1; true; i++) { Thread.sleep(5000) if (updateThread == null) { break } sendUpdate(i % 12 == 0) // every minute } } def initialThreads = 1 try { if (args.length > 0) { initialThreads = Integer.parseInt(args[0]) } } catch (NumberFormatException e) { // oh well } // spin up the threads initialThreads.times { addSearcher() } def isr = new InputStreamReader(System.in) while (true) { char c = isr.read(); if (c == 'q') { killAllSearchers() } else if (c == '+') { addSearcher() } else if (c == '-') { killSearcher() } if (searcherThreads.size() == 0) { updateThread = null println("exiting...") Thread.sleep(1000) sendUpdate() break } }
In a nutshell, the ambiguity is whether this would be an identity hash (same letters, different case) for a mythical bb3()
hash function that is only three characters long:
bb3('5Ae') = '5ae'
The MD5 hash algorithm (like all other message digests) operates on bytes, not character strings. However, humans can deal with strings far better than raw bytes, so MD5s are usually hex encoded for human consumption. "Hex", by the way, referse to hexadecimal notation, or base-16
numbers. Since we only have 10
numeric digits, hex uses the letters a-f
(or A-F
) to represent the digits from 10-15
.
The problem arises because the letters representing the hex digits don't have any case semantics. For example, both "4ed9...
" and "4ED9...
" are valid representations for md5('def')
, because in this case the letters characters aren't letters, they're digits.
Most systems seem to hex-encode with lower case letters, but not all. I've not personally run across any that use a mixture of lower and upper case letters, but it would not be incorrect to do so (just sorta weird).
The input string (the string being hashed), however, is case sensitive because it's not a list of digits, it's a collection of bytes. At the byte level, upper and lower letters are stored differently. For example:
md5('def') = '4ed9407630eb1000c0f6b63842defa7d' md5('DEF') = '822dd494b3e14a82aa76bd455e6b6f4b'
So the question is whether or not the hex-encoded has can be recased to create a match. For example, consider this result (using a made-up hash function called bb3()
):
bb3('5Ae') = '5ae' (or '5Ae' or '5AE') bb3('5ae') = 'ae7' (or 'aE7' or 'AE7')
What's interesting about the first example is that the hash value has the same letters as the input string, just in a different case. If we change the case of the input string, we'll get a different hash (the second example). However, if we change the case of the output string we won't change the meaning of the result, and we will have lexicographically equal strings (the Kember Identity Hash).
Succinctly stated, the ambiguity is whether mixed-case hashes can be used in the identity. I've made the assumption that all hashes must be lowercase, but Elliot hasn't made a statement one way or the other.