Part4. Having a Mint


Following on from part 3 you should now have your Rails application nicely hosted and deployed using Capistrano from your SVN repository. All good – but what happens when you want to serve another public web folder into the mix e.g. in public/myfolder, but you DON’T want it under version control ? Why you ask ?

Well lets take this example further, what if this folder contained a copy of the Mint stats package in public/mint – i.e. PHP code, that you wanted to execute and run. Since new versions of Mint are released fequently, and i’m often adding and removing ‘Peppers’ (read Mint Plugins), I see no need to put Mint under version control. Purists might argue differently, but I dont collect stats on my local dev box either so it makes no sense to have it there after a checkout.

Assuming your convinced I’ll start off getting Mint up and running on the server, with PHP through Lighttpd and stats being recorded. Then i’ll add some extra functions to the Capistrano deployment, allowing us to deploy ‘around’ this folder on the public server.

Configuring Lighttpd with PHP and FastCGI

First ensure your server already has PHP installed and is configured with the FastCGI module;

$> php -v
PHP 5.1.4 (cgi-fcgi) (built: Aug  2 2006 23:53:20)
Copyright (c) 1997-2006 The PHP Group
Zend Engine v2.1.0, Copyright (c) 1998-2006 Zend Technologies

If you dont see the (cgi-fcgi) part, then you’ll need to re-configure PHP with the FastCGI module. There are instructions for installing PHP and configuring modules at – and elsewhere on the web – Google is your friend.

Now configure your config/lighttpd.conf file (do this locally on your dev machine, checkin to SVN and redeploy – or just edit on the server to get things working then apply the same changes locally, checkin & deploy), in config/lighttpd.conf;

server.modules = ( "mod_rewrite", "mod_accesslog", "mod_fastcgi", "mod_compress", "mod_expire", "mod_proxy" )

fastcgi.server = (".php" => ("" =>
  ("socket" => CWD + "/tmp/sockets/php.socket",
    "bin-path" => "/usr/local/bin/php",
    "bin-environment" => ("PHP_FCGI_CHILDREN" => "1", "PHP_FCGI_MAX_REQUESTS" => "5000")

# lighttpd proxy server (pound) - no proxy for mint please
$HTTP["url"] !~ "^/mint/" {
  proxy.balance = "fair"
  proxy.server  = ( "/" => ( ( "host" => "", "port" => 6000 ) ) )

Things to note here are;

  • Add mod_fastcgi to your server.modules if its not there already.
  • fastcgi.server is configured with your server’s PHP bin path, and a tmp socket (which can be anywhere).
  • We DON’T want to proxy requests for Mint through Pound to Mongrel – since this is a PHP app, we just want Lighttpd to deal with it using FastCGI – hence the need for the if statement on the pound proxy configuration.

Setting up Mint (e.g. in Typo)

First grab your licensed copy of Mint for your domain – and drop the mint/ folder into current/public/mint to the main Rails directory on your server. Next add the following url.rewrite to your lighttpd configuration;

  # configure for mint access
  url.rewrite = ("^/mint/$" => "/mint/index.php", "^/mint/\?(.*)" => "/mint/index.php?$1")

This url.rewrite is nessecary to ensure that Lighttpd doesnt treat files or folders under /mint/ as Rails specific. Again this should also be added in your local copy, and checked in to SVN. DON’T deploy with Capistrano just yet – because doing so will check out a new release and archive the existing current/ folder (and hence remove our /current/public/mint) – You’ll also want to add the following javascript in the header of any page where stats should be reported;

<script src="/mint/?js" type="text/javascript"></script>

Setting up Capistrano

In order to keep using Capistrano to deploy to your server with mint in the current/public/mint folder – we need to deploy around it. There are probably better ways to do this – in the Capistrano deployment recipe file (config/deploy.rb) – I added two action functions; one occurs before deployment starts, the other after. The functions basically move the mint/ folder out of the way (to the top level shared folder) while Capistrano does its stuff. So in config/deploy.rb;

  # executed before deployment
  task :before_deploy, :roles => [:web, :app] do
    # copy the mint/ and files/ folders to holding area in shared/
    puts "before deploy ---> copy mint and files to shared from current"
    run "sudo mv #{deploy_to}/#{current_dir}/public/mint  #{shared_dir}/mint"
    run "sudo mv #{deploy_to}/#{current_dir}/public/files  #{shared_dir}/files"

  # executed after deployment
  task :after_deploy, :roles => [:web, :app] do
    # copy the mint/ and files/ folders back from holding area in shared/
    puts "after deploy ---> copy mint and files from shared to current"
    run "sudo mv #{shared_dir}/mint #{deploy_to}/#{current_dir}/public/mint"
    run "sudo mv #{shared_dir}/files #{deploy_to}/#{current_dir}/public/files"

You’ll also see i’m doing the same thing for a current/public/files folder – This folder is used by Typo for uploaded files for blog entries. Without these actions in place, each Capistrano deploy would clear out the files/ folder on your server.

Trying it out

Check the changes in, make sure the mint folder is on your server (and correctly configured) and run a new Capistrano deploy. During this you should see the before and after tasks running (you may will be asked for a password to sudo). You should then see your copy of mint up and running, like so; /mint/

Thus ends the mini-guide; Any suggestions, comments or questions are appreciated. Normal useless entries will resume here as of today.


Minty Fresh

Ive recently added Shaun Inman’s excellent (PHP/MySQL based) stats analyser Mint, providing me with a very fresh look at my site – By adding the GeoMint Pepper (plugin) I can now see where all you visitors are actually coming from (roughly).

It was all incredibly easy to install – 25mins from actually buying the thing I had it up and running with a bunch of peppers.

If you’re interested in listening about how the development of Mint encourages good API development listen to Shaun’s presentation at the Carson Workshops Conference earlier this year.

April 15, 2006 06:21 by

PHP pushups

I recently had the rather unpleasant task of writing some PHP to compare a CSV file (with some 22,000+ entries) with a mySQL database. With the CSV file holding the master copy of data, it would update/insert and delete from mySQL. The script needed to run as a daily Cron on my (shared) Dreamhost box.

This would normally be simple enough, using a status field on the CSV file to indicate fields that had been updated. Unfortunately there was no status field, and none could be added. In fact the CSV file could not be modified at all. The only way to check if a row had been modified was to do a field by field comparison on every row.

I started off with a single script that imported the CSV to an Array, and also extracted all rows from the db table. With some looping to search through all rows and all fields in each row, I got the script to work. Great! (I Thought)

But with 22,000 rows looping ~22,000 other rows, (22,000 × 22,000 = 484 million loops ) – in short the script took minutes to execute, and if left long enough it ate up 100% CPU usage (through php). Even using a exponential back-off search took too long.

On Dreamhost, if any script you run nears 100% CPU usage, its killed automatically. A major rethink was required. So I decided to split the script in two.

  • script 1 – would create a temporary table in the database and simply import the CSV file into it – row by row.
  • script 2 – running a few minutes later, would then compare the two tables using mysql queries (rather than a php loop search) – after performing all updates/inserts and deletes the temporary table would be destroyed.

The comparison script (2), works by looking for id matches between the two tables, and marking any rows found. If found – both rows are fetched and a field by field comparison is made to check if an UPDATE statement is needed.

Finally any rows not marked as found in the master CSV file were added, and any rows not marked as found in the DB were deleted.

Using 2 tables for the comparison rather than looping and searching in php, meant that the strain was now on mySQL (rather than PHP). Dreamhost seems to tolerate this, and the php script execution time is reduced from minutes to seconds.

And, why am I explaining all this ? – you ask,

1. So I can remember what on earth I did. and;
2. Im curious to know if anyone can think of a better way to do this. Bearing in mind the limiting factors,
the CSV file CANNOT be altered in any way. It has to execute in seconds and the CPU usage cannot approach 100%

December 08, 2005 03:18 by
