|
News
Downloads
ToDos
How does it work?
Using it on the
Zaurus
Generation on
Windows
|
This
page describes how to get a (somewhat) static version of Wikipedia
running on the Zaurus (and
probably any other UNIX-based PDA/PC). The version is static in
that it is not editable and pages are pregenerated but you will
nevertheless require an http-server and a web browser on the
Zaurus since the encyclopedia will be stored highly-compressed and
is uncompressed using cgi-scripts. Compression reduces storage
demands significantly and only thereby wikipedia becomes usable on
a PDA (e.g. the German static text version of December '03
generated with Alfio
Puglisi's original wiki2static.pl was ca. 210 MB,
wiki2zaurus's version reduced this to ca. 38 MB). See How
does it work? for more on the inner workings.
Storage use of wikipedia generated from April
2005 dumps (in KB):
307647 de
64520 math
641394 media
799133 en (March version)
94861 math
1417522 media
So the German version will just fit on a 1GB microdrive. For
the english version you need a 4GB drive.
Alternatives
As far as I know, wiki2zaurus's scripts are
used by two other projects to generate a static version of the
wikipedia:
wikijserver
- Offers a specialized java webserver for serving the wikipedia
pages. Setup should be a bit easier. Compression happens with zip
instead of tar.gz which will in general mean faster access to
individual pages but worse compression.
wikipda.de
- Online access to a version of the wikipedia optimized for
viewing on PDAs. Generated on a
Windows system.
Requirements
You will need some programs on your desktop to
convert a wikipedia database dump into the compressed static pages
as well as some programs on the Zaurus to decompress them for
viewing. Get most files here
or individual ones below.
On the desktop:
These programs/resources are needed if you want
to create the compessed wikipedia files yourself. If you download
them below, you don't need this.
Linux/UNIX. Really! I will answer no
questions on how to get wiki2staticz.pl running on Windows. If
you really want to, Illya from wikipda.de (see above) has
provided a short notice on how he was able to run the scripts
(without image generation) on Windows.
Perl (should be installed already)
texvc – to
create small images as replacements of the tex-like math notation
of wikipedia. Get the binary from the left if you trust me (and
have an i586 Linux with Suse 9.0) or get the MediaWiki
source and compile the texvc part (requires ocaml !).
ImageMagick
- to resize the images. Only needed if you also want images and
not a text-only version. Most probably the ImageMagick version
provided by your distribution has GIF compression disabled. So
better compile yourself. Or bug the wikipedians to only use PNG
instead of GIF.
optipng
(optional)– If you want to remove every last fat bit from
your png images.
wiki2staticz.pl – A variation of the
wiki2static.pl
perl script of Alfio Puglisi, which I modified heavily. It will
create the static pages, the search index, download images and
create TeX-graphics. Included are helper scripts like
img_compressor (scaling down images) and wiki-tarX (putting
everything into tgz files in the end), as well as the language
data files required by wiki2staticz.pl. (Do not use the latest
language files from wikimedia since their format has changed.)
If wiki2staticz complains about missing
perl packages (most probably LWP/UserAgent) you have to install
them with su -c “perl -MCPAN -e shell”. On
Gentoo installing dev-perl/libnet should
help.
Lots of RAM/swap. Converting the English
wikipedia requires at least 512MB memory (in addition to what
your OS uses). So make sure your swap is big enough.
Even more HD space. Remember that the
static pages are created uncompressed at first. If you cache
images (recommended to massively speed up regenerating a
version), expect to need >10GB.
A way to get this all to the Zaurus (e.g.
scp over USB. Or a flash card reader).
On the Zaurus:
You will always need these programs/resources
on the Zaurus (“the Zaurus” of course refers to
the SL5500, what else. Although it will most probably run on
others as well.).
Enough storage - I bought myself an
IBM/Hitachi 1GB
Microdrive.
A web server – I suggest apache.
I tried thttp first but it is too secure (and too unconfigurable)
with not accepting symbolic links and strict adherence to execute
file permissions (which is difficult if you want to keep your
flash card formatted VFAT).
A browser – Opera comes with the
Zaurus. If you use any other it should be able to understand
CSS-files and javascript.
wpg cgi files
– which will decompress and untar texts and search indices
on the fly. They are generated by wiki2staticz.pl nowadays, you
only have to copy them.
What to do
Skip to step 10 if you downloaded the
ready-made data files below.
Download the database
dump you want to convert. You don't have to decompress it.
Edit the configuration of wiki2staticz.pl
to fit your needs. Most important are:
$server_dir – Where do you want all
file to end up (maybe within your http server's directory so you
can test it)
$wiki_language – What language do
you convert (Has to match the dump you are converting!)
$other_languages – If other
languages are converted later/previously, insert them here to
generate cross-references to them.
$include_media and $include_tex –
see comments in file
$xsize, $ysize – Maximum sizes for
images in pixels in horizontal and vertical direction. Note: If
a picture increases in disk size when decreasing in screen size
(happens surprisingly often), the version with the old screen
size will be copied. img_compressor is a script to minimize the
storage requirements not your scrolling needs to view the image
on the Zaurus.
$jpgquality – Higher values =>
better pictures & more storage. Useful values between ca. 60
and 90.
$crushlimit – PNG files starting
with which size should be run through pngcrush. Lower values
might save additional space but lengthen processing time.
$jpegbetterpng and $pngbetterjpeg –
The image compressor will convert an image from jpg to png (or
the other way around) if it saves more bytes than specified in
these variables. Since jpeg is a lossy format, converting from
png to jpg will loose quality. It should therefore only be done
if the gain is high enough. Also note that img_compressor will
not change the postfix of the name (so a .png file converted to
jpeg will still be called xxxx.png, browsers don't seem to
care).
I disabled all parameter parsing (since
Getopt::Long is not installed on Suse by default) so giving
command line parameters (except the database) will not work.
Run perl wiki2staticz.pl
database_dump_name.gz . Interrupting and Restarting (with
same command line) is possible but not recommended.
Drink some coffee, cook a meal, raise a
few children ... and make sure your swap is big enough (see man
mkswap and man swapon if necessary). Most
time-consuming is image-download. If you generate the english
wikipedia including images for the first time expect several days
of run-time (on 768kBit/s-ADSL).
(If you used a previous version of
wiki2zaurus please note that ./wiki-tar is now automatically run
at the end, so no need to start it by hand.)
If you want more languages repeat from
step 2 for them. Each run should create a new language directory
in your wiki directory for html and search indices and puts its
images and TeX files in the general media and math directories.
If you want to test: Copy the cgi scripts
into the cgi-bin directory of your local web server. You need a
wpg-<languageshortcut> and a wpg-<languageshortcut>-s
file for each wikipedia language you want to access. You should
find those in the htdocs/wiki/<languageshortcut> directory.
If it works, congratulate yourself: You
are now ready to move to
the Zaurus.
If you use a flash card with 512 MB or
more I recommend reformatting it with FAT32 instead of FAT16
since you will otherwise loose huge amounts of storage to the
math graphic files which are all around 1 KB. Be warned that a
lot of cameras will not be able to read a FAT32 card (e.g. not
even former high-end models like Canon G2).
e.g. Do mkdosfs -s 4 -F 32 /dev/hda1 on
the Zaurus to format an unmounted(!) compact flash or microdrive.
If you followed my description up to here, you should be
experienced enough to know that formatting will erase all content
of a storage medium!
I assume apache installed itself in
/home/www. You will need a few additional directories. The
following commands create the setup I use:
mkdir /mnt/cf/www
mkdir /mnt/cf/www/wiki
ln -s /mnt/cf/www/wiki /home/www ##probably not necessary
ln -s /mnt/cf/www/wiki /home/www/htdocs
Copy
everything to the wiki directory of the Zaurus. I really
recommend a flash card reader. Otherwise do something like scp -r
<local wiki directory>
root@192.168.129.201:/mnt/cf/www/wiki if you use the above
directory setup. '-r' will recurse through all directories. I had
problems with connection losses due to burbs on the USB line. You
might have to restart from somewhere in the middle.
Copy the cgi-scripts to /home/www/cgi-bin.
You will probably have to edit their paths now (unless you used
/home/www/htdocs/wiki as target directory also on the desktop).
Start the web server and the browser on
the Zaurus.
Hope that your busybox is new enough to
correctly execute the commands in the scripts (I use OpenZaurus
3.5.2). But I didn't do anything too fancy in the scripts.
Go to e.g. http://localhost/wiki/en.
Voila! Now you will know everything
everywhere. (Note that searching in the english encyclopedia
takes about 10 seconds to decompress the appropriate file, so it
might take some time till you know.)
'Binaries'
Generating things yourself sounds too
difficult? I have put up some data files for you to download via
the bittorrent protocol.
See the downloads page.
ToDos
Get Categories working (currently the
static versions of them are broken and therefore are removed).
Find someone to update the CSS files.
Add “What links here” links to
the bottom of a page (might increase storage requirements too
much).
Collect “Other languages”
cross links and put them onto the top of the page such as
wikipedia does.
Searching for words with umlauts doesn't
work (the search cgi script somehow has to convert back the %E4
notation)
Find a working browser for the SL5500
under OpenZaurus which does not crash (as Opera 7.x does) and
displays png files (which Opera 6.x doesn't).
Get the Zaurus to guess what I want to
know beforehand and whisper it automatically in my ear.
|