Prime Factorization

In my ongoing rampage through the Project Euler problems I've needed a prime factorization routine several times, and figured I'd share it as a complement to the sieve I shared last week.  Implemented in Groovy, of course.  Typing is a little strange on first glimpse because Groovy uses Integer by default which only supports values up to 2.1 billion.  So there are a number of explicit BigInteger types/casts.

def factor(BigInteger num) {
  def factors = []
  while (num % 2 == 0) {
    factors.push(2)
    num = (BigInteger)(num / 2)
  }
  BigInteger end = floor(sqrt(num))
  for (def currFactor = 3; num > 1 && currFactor <= end; currFactor += 2) {
    if (num % currFactor == 0) {
      while (num % currFactor == 0) {
        factors.push(currFactor)
        num = (BigInteger)(num /currFactor)
      }
      end = (BigInteger) floor(sqrt(num))
    }
  }
  if (num != 1 || factors.size() == 0) {
    factors.push(num)
  }
  factors
}

The algorithm is pretty simple: loop through all odd numbers and see if they're a factor of the target number.  There's a special case at the top for two (the only even prime), which then lets the loop skip all the even numbers (effectively cutting it's work in half).  There's also a special case at the bottom for ones.  If you attempt to factor one or a prime number, you'll get that number back as the only item in the list, otherwise you'll get only prime factors (and no, one is not prime).  Note that the list can contain duplicates; the method returns all prime factors, not just distinct ones.

Why does this work?  Because by the time the loop gets to each number all of it's factors have been processed, so it can't be both a factor of the target number and composite (non-prime).  So if it's a factor, it has to be prime as well.

For those of you who follow me on Twitter (@barneyb), this is not the crappy algorithm I first implemented.  This is the good one.  My first attempt was an "inversion" of the sieve algorithm which resulted in huge amounts of unneeded work and memory consumption because of the data structures involved.  The second algorithm is virtually identical, just the data structures are different (enormously simpler).  I wasn't really that far off with the first attempt, I just didn't do a good job of distilling the problem down to it's essence, so I was doing a lot of extra work (three-orders-of-magnitude extra).

Comments are closed.