Back to Homepage

Wiki2Zaurus - Wikipedia for the Zaurus

The most portable encyclopedia on this planet (except for a few Windows competitors ;-)

News

Downloads

ToDos

How does it work?

Using it on the Zaurus

Generation on Windows

Entry on Sharp Zaurus in the Wikipedia running on ZaurusThis page describes how to get a (somewhat) static version of Wikipedia running on the Zaurus (and probably any other UNIX-based PDA/PC). The version is static in that it is not editable and pages are pregenerated but you will nevertheless require an http-server and a web browser on the Zaurus since the encyclopedia will be stored highly-compressed and is uncompressed using cgi-scripts. Compression reduces storage demands significantly and only thereby wikipedia becomes usable on a PDA (e.g. the German static text version of December '03 generated with Alfio Puglisi's original wiki2static.pl was ca. 210 MB, wiki2zaurus's version reduced this to ca. 38 MB). See How does it work? for more on the inner workings.

Storage use of wikipedia generated from April 2005 dumps (in KB):

307647  de
64520   math
641394  media

799133  en        (March version)
94861   math
1417522 media

So the German version will just fit on a 1GB microdrive. For the english version you need a 4GB drive.

Alternatives

As far as I know, wiki2zaurus's scripts are used by two other projects to generate a static version of the wikipedia:

wikijserver - Offers a specialized java webserver for serving the wikipedia pages. Setup should be a bit easier. Compression happens with zip instead of tar.gz which will in general mean faster access to individual pages but worse compression.

wikipda.de - Online access to a version of the wikipedia optimized for viewing on PDAs. Generated on a Windows system.

Requirements

You will need some programs on your desktop to convert a wikipedia database dump into the compressed static pages as well as some programs on the Zaurus to decompress them for viewing. Get most files here or individual ones below.

On the desktop:

These programs/resources are needed if you want to create the compessed wikipedia files yourself. If you download them below, you don't need this.

  • Linux/UNIX. Really! I will answer no questions on how to get wiki2staticz.pl running on Windows. If you really want to, Illya from wikipda.de (see above) has provided a short notice on how he was able to run the scripts (without image generation) on Windows.

  • Perl (should be installed already)

  • texvc – to create small images as replacements of the tex-like math notation of wikipedia. Get the binary from the left if you trust me (and have an i586 Linux with Suse 9.0) or get the MediaWiki source and compile the texvc part (requires ocaml !).

  • ImageMagick - to resize the images. Only needed if you also want images and not a text-only version. Most probably the ImageMagick version provided by your distribution has GIF compression disabled. So better compile yourself. Or bug the wikipedians to only use PNG instead of GIF.

  • optipng (optional)– If you want to remove every last fat bit from your png images.

  • wiki2staticz.pl – A variation of the wiki2static.pl perl script of Alfio Puglisi, which I modified heavily. It will create the static pages, the search index, download images and create TeX-graphics. Included are helper scripts like img_compressor (scaling down images) and wiki-tarX (putting everything into tgz files in the end), as well as the language data files required by wiki2staticz.pl. (Do not use the latest language files from wikimedia since their format has changed.)

  • If wiki2staticz complains about missing perl packages (most probably LWP/UserAgent) you have to install them with su -c “perl -MCPAN -e shell”. On Gentoo installing dev-perl/libnet should help.

  • Lots of RAM/swap. Converting the English wikipedia requires at least 512MB memory (in addition to what your OS uses). So make sure your swap is big enough.

  • Even more HD space. Remember that the static pages are created uncompressed at first. If you cache images (recommended to massively speed up regenerating a version), expect to need >10GB.

  • A way to get this all to the Zaurus (e.g. scp over USB. Or a flash card reader).

On the Zaurus:

You will always need these programs/resources on the Zaurus (“the Zaurus” of course refers to the SL5500, what else. Although it will most probably run on others as well.).

  • Enough storage - I bought myself an IBM/Hitachi 1GB Microdrive.

  • A web server – I suggest apache. I tried thttp first but it is too secure (and too unconfigurable) with not accepting symbolic links and strict adherence to execute file permissions (which is difficult if you want to keep your flash card formatted VFAT).

  • A browser – Opera comes with the Zaurus. If you use any other it should be able to understand CSS-files and javascript.

  • wpg cgi files – which will decompress and untar texts and search indices on the fly. They are generated by wiki2staticz.pl nowadays, you only have to copy them.



What to do

Skip to step 10 if you downloaded the ready-made data files below.

  1. Download the database dump you want to convert. You don't have to decompress it.

  2. Edit the configuration of wiki2staticz.pl to fit your needs. Most important are:

    • $server_dir – Where do you want all file to end up (maybe within your http server's directory so you can test it)

    • $wiki_language – What language do you convert (Has to match the dump you are converting!)

    • $other_languages – If other languages are converted later/previously, insert them here to generate cross-references to them.

    • $include_media and $include_tex – see comments in file

    • $xsize, $ysize – Maximum sizes for images in pixels in horizontal and vertical direction. Note: If a picture increases in disk size when decreasing in screen size (happens surprisingly often), the version with the old screen size will be copied. img_compressor is a script to minimize the storage requirements not your scrolling needs to view the image on the Zaurus.

    • $jpgquality – Higher values => better pictures & more storage. Useful values between ca. 60 and 90.

    • $crushlimit – PNG files starting with which size should be run through pngcrush. Lower values might save additional space but lengthen processing time.

    • $jpegbetterpng and $pngbetterjpeg – The image compressor will convert an image from jpg to png (or the other way around) if it saves more bytes than specified in these variables. Since jpeg is a lossy format, converting from png to jpg will loose quality. It should therefore only be done if the gain is high enough. Also note that img_compressor will not change the postfix of the name (so a .png file converted to jpeg will still be called xxxx.png, browsers don't seem to care).

  3. I disabled all parameter parsing (since Getopt::Long is not installed on Suse by default) so giving command line parameters (except the database) will not work.

  4. Run perl wiki2staticz.pl database_dump_name.gz . Interrupting and Restarting (with same command line) is possible but not recommended.

  5. Drink some coffee, cook a meal, raise a few children ... and make sure your swap is big enough (see man mkswap and man swapon if necessary). Most time-consuming is image-download. If you generate the english wikipedia including images for the first time expect several days of run-time (on 768kBit/s-ADSL).

  6. (If you used a previous version of wiki2zaurus please note that ./wiki-tar is now automatically run at the end, so no need to start it by hand.)

  7. If you want more languages repeat from step 2 for them. Each run should create a new language directory in your wiki directory for html and search indices and puts its images and TeX files in the general media and math directories.

  8. If you want to test: Copy the cgi scripts into the cgi-bin directory of your local web server. You need a wpg-<languageshortcut> and a wpg-<languageshortcut>-s file for each wikipedia language you want to access. You should find those in the htdocs/wiki/<languageshortcut> directory.

  9. If it works, congratulate yourself: You are now ready to move to the Zaurus.



  10. If you use a flash card with 512 MB or more I recommend reformatting it with FAT32 instead of FAT16 since you will otherwise loose huge amounts of storage to the math graphic files which are all around 1 KB. Be warned that a lot of cameras will not be able to read a FAT32 card (e.g. not even former high-end models like Canon G2).

    e.g. Do mkdosfs -s 4 -F 32 /dev/hda1 on the Zaurus to format an unmounted(!) compact flash or microdrive. If you followed my description up to here, you should be experienced enough to know that formatting will erase all content of a storage medium!

  11. I assume apache installed itself in /home/www. You will need a few additional directories. The following commands create the setup I use:

    mkdir /mnt/cf/www
    mkdir /mnt/cf/www/wiki
    ln -s /mnt/cf/www/wiki /home/www  ##probably not necessary
    ln -s /mnt/cf/www/wiki /home/www/htdocs
            
  12. Main Page of english Wikipedia on ZaurusCopy everything to the wiki directory of the Zaurus. I really recommend a flash card reader. Otherwise do something like scp -r <local wiki directory> root@192.168.129.201:/mnt/cf/www/wiki if you use the above directory setup. '-r' will recurse through all directories. I had problems with connection losses due to burbs on the USB line. You might have to restart from somewhere in the middle.

  13. Copy the cgi-scripts to /home/www/cgi-bin. You will probably have to edit their paths now (unless you used /home/www/htdocs/wiki as target directory also on the desktop).

  14. Start the web server and the browser on the Zaurus.

  15. Hope that your busybox is new enough to correctly execute the commands in the scripts (I use OpenZaurus 3.5.2). But I didn't do anything too fancy in the scripts.

  16. Go to e.g. http://localhost/wiki/en.

  17. Voila! Now you will know everything everywhere. (Note that searching in the english encyclopedia takes about 10 seconds to decompress the appropriate file, so it might take some time till you know.)

'Binaries'

Generating things yourself sounds too difficult? I have put up some data files for you to download via the bittorrent protocol. See the downloads page.

ToDos

  • Get Categories working (currently the static versions of them are broken and therefore are removed).

  • Find someone to update the CSS files.

  • Add “What links here” links to the bottom of a page (might increase storage requirements too much).

  • Collect “Other languages” cross links and put them onto the top of the page such as wikipedia does.

  • Searching for words with umlauts doesn't work (the search cgi script somehow has to convert back the %E4 notation)

  • Find a working browser for the SL5500 under OpenZaurus which does not crash (as Opera 7.x does) and displays png files (which Opera 6.x doesn't).

  • Get the Zaurus to guess what I want to know beforehand and whisper it automatically in my ear.



last edit: 18.04.2005 by M. Baumeister