Conway's Game of Life
I've designed quite a fast implementation of the core algorithm for Conway's
Game of Life.
Features:
- no LUT
- very compact implementation
- operates directly on bit-mapped cells (no transformation)
- uses only one conditional
- easily adaptable to use SIMD instruction sets such as from MMX and SSE
Performance:
| Language |
Environment |
Cell Generations per Second |
| Java |
Bea JRockit 7.0 (2) |
790123000 |
| Java |
Sun JRE 1.4.1 |
434082000 |
| Java |
Sun JRE 1.4.1 -server |
541798000 |
| C |
MS VC++ 6.0 full opt. (1) |
620000000 |
| C |
Intel C++ full opt. (PGO) (1) |
1800000000 |
Performance data measured on a P4, 2000 Mhz.
(1) Recalled from memory
(2) Interestingly, this shows that JRockit 7.0 is faster than MS VC++ 6.0
How it works:
The algorithm counts the number of living cells in a 3x3 field to determine
if the center cell should live or be dead. There are always 32 counters (sizeof
int) working in parallel. Each counter is represented by one bit it the int
machine word. Three machine words are used to represent the state of 32 counters.
The algorithm also exploits the fact, that overlapping 3x3 fields share a number
of cells to be counted.
The Code:
This is an implementation in Java. The interesting class is
rwi.ca23_3.CA23_3. Its main method executes a benchmark. The method
computeNextGeneration and the inner class CellCounter are key.
Download the code
Revision History:
- updated gameoflife.zip 2003-03-25
- Changed signature of CA23_3.computeNextGeneration
- Fixed serious bug in CA23_3 (before it did not compute the next generation
according to Conway's rules)
- Added Universe and Tile classes performing GOL on a relatively sparse
universe
- Added Excel charts comparing my implementation with Life32 for various
patterns
- Added some patterns stolen from Life32
How it compares to Life32:
I've tried to make a comparison to Life32. However, such a comparison is
somewhat problemeatic. This is why:
- Life32 displays cells, my implementation does not. In order to minimize
the effect, I've set framedropping to the max in Life32.
-
Life32 has optimizations for stable and empty regions, my implementation
has not (yet). Depending on the choice of life pattern, this may speed up
Life32 (when optimizations are possible) or slow it down (overhead when no
optimizations possible). My implementation only depends on the size of the
universe. Since the optimizations would make for an arbitrary difference, I
chose a pattern packed full with period-4 oscillators. Life32 will not
benefit from its optimizations this way, so the speed of the core algorithm
will be put under test.
-
Life32 is implemented in Delphi, my implementation is Java. One could
theoretically compare my implementation to Alan Hensel's Life Applet, which
is in Java too and AFAIK Life32 uses a 1:1 Delphi port of Alan's algorithm.
Ok, now for the results:
Life32 version: 2.15
Universe size: 268x212
Number of living cells: 13824
Cell generations per second: 331,200,279
My implementation
Universe size: 256x256
Number of living cells: doesn't matter
Cell generations per second: 759,397,000
Thus my implementation is about 2.3 times faster than Life32. I'd
extrapolate that using the Intel C++ compiler for my implementation a factor
of 5 could be reached and using SSE2 maybe a factor of 7.
Home