Fork me on Github
Fork me on Github

Joe Dog Software

Proudly serving the Internets since 1999

Linux Marketshare

I’m responsible for around 150 GNU/Linux servers. Not one of them actually shipped with Linux. They were all bare metal installs at the point of delivery. That’s generally how Linux works. You buy hardware from one vendor and OS entitlements from another. If my experience isn’t unusal, then the latest server track numbers from IDC are quite extraordinary.

IDC tracks servers shipped by OEMs to customers and reports on hardware and OS marketshare. It doesn’t track bare metal installations, hardware re-provisions and VM guest installs. Last quarter, according to IDC, factory revenue for Linux grew while it shrank for Windows and UNIX. On top of the provisioning methods I mentioned above, customers are increasingly asking IBM, HP and Dell to ship servers with Linux installed.

According to IDC, demand was driven by the need for high performance and cloud computing. Linux has also earned a reputation as more reliable and more secure than that other Intel OS. And if you want to ruin hardware performance, just add virus protection which is a necessity in the Windows operating environment. Given the risk vs. the reward of increased performance, many Linux administrators simply eschew virus protection. That gives Linux a real world performance boost over its rival from Microsoft.

Once Linux conquers the datacenter, it’s only a matter of time until millions of open source developers really start to focus on the desktop. Proprietary software’s best days are behind it.

 



Concurrency and the Single Siege

We’re frequently asked about concurrency. When a siege is finished, one of its characteristics is “Concurrency” which is described with a decimal number. This stat is known to make eyebrows furl. People want to know, “What the hell does that mean?”

In computer science, concurrency is a trait of systems that handle two or more simultaneous processes. Those processes may be executed by multiple cores, processors or threads. From siege’s perspective, they may even be handled by separate nodes in a server cluster.

When the run is over, we try to infer how many processes, on average, were executed simultaneously the web server. The calculation is simple: total transactions divided by elapsed time. If we did 100 transactions in 10 seconds, then our concurrency was 10.00.

Bigger is not always better

Generally, web servers are prized for their ability to handle simultaneous connections. Maybe your benchmark run was 100 transactions in 10 seconds. Then you tuned your server and your final run was 100 transactions in five seconds. That is good. Concurrency rose as the elapsed time fell.

But sometimes high concurrency is a trait of a poorly functioning website. The longer it takes to process a transaction, the more likely they are to queue.  When the queue swells, concurrency rises. The reasons for this rise can vary. An obvious cause is load.  If a server has more connections than thread handlers, requests are going to queue. Another is competence – poorly written apps can take longer to complete then well-written ones.

We can illustrate this point with an obvious example. I ran siege against a two-node clustered website. My concurrency was 6.97. Then I took a node away and ran the same run against the same page. My concurrency rose to 18.33. At the same time, my elapsed time was extended 65%.

Sweeping conclusions

Concurrency must be evaluated in context. If it rises while the elapsed time falls, then that’s a Good Thing™. But if rises while the elapsed time increases, then Not So Much™. When you reach the point where concurrency rises and elapsed time is extended, then it might be time to consider more capacity.

 



HTTP Authentication

Some of you seem to confuse Basic authentication with form-based authentication. They’re not the same and the differences are important. If you don’t configure siege for the appropriate authentication method, it will be on the outside looking in at an HTTP-401.

Basic authentication occurs at the protocol level. It was originally described in HTTP/1.0 and later moved to RFC 2617. Basic authentication is a challenge/response framework. When the server receives a request for a protected resource, it challenges the user to authenticate himself. It will make the item available only after the user is autheticated.

Here’s an example exchange using basic.php from the html directory inside the siege source code:

GET /siege/basic.php HTTP/1.0
Host: http://www.joedog.org
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.71b6)
Connection: close
HTTP/1.1 401 Authorization Required
Date: Thu, 16 Feb 2012 13:09:53 GMT
Server: CERN/1.0A
X-Powered-By: PHP/5.2.5
WWW-Authenticate: Basic realm="siege_basic_auth"
Status: 401 Unauthorized
Content-Length: 178
Connection: close
Content-Type: text/html; charset=WINDOWS-1251
GET /siege/basic.php HTTP/1.0
Host: http://www.joedog.org
Authorization: Basic c2llZ2U6aGFoYQ==
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.71b6)
Connection: close
HTTP/1.1 200 OK
Date: Thu, 16 Feb 2012 13:09:53 GMT
Server: CERN/1.0A
X-Powered-By: PHP/5.2.5
Content-Length: 278
Connection: close
Content-Type: text/html; charset=WINDOWS-1251

See what happened? Siege requested /siege/basic.php and the server was all “Whoa! I don’t know who you are.” It issued an HTTP 401 challenge to siege which responded by sending its username and password in BASE64 encryption: c2llZ2U6aGFoYQ==

In this example, I emulated HTTP Basic authentication with a php program. Typically, Basic auth is setup at the server level. Here’s an example in apache:

<Location "/siege">
   AuthType basic
   AuthName "siege_basic_auth"
   AuthBasicProvider file
   AuthUserFile /var/www/etc/passwd
   AuthGroupFile /var/www/etc/group
   Require valid-user
   Require group siege
   Satisfy All
 </Location>

To configure siege to use basic authetication, you need to add a login to your .siegerc file. Search the file for WWW-Authenticate. The directive is login and it takes three values separated by a colon. username:password:realm. Our basic.php username and password are ‘siege’ and ‘haha’. So our login looks like this:

login = siege:haha:siege_basic_auth

The third argument (realm) is optional. If you don’t specify a realm, siege will send ‘siege:haha’ every time it faces an HTTP basic challenge. By setting a realm, you can configure it to use multiple logins:

login = admin:secret:Administration
login = siege:haha:siege_basic_auth
login = root:d41ly:high_level

Now you can also restrict access programmatically. This is referred to as form-based authentication. In order to configure siege to login in this manner, you’ll need to reproduce a browser’s action.

To illustrate this, we’ve included login.php in the html directory of the siege source code. That page accepts both GET and POST requests. It produced an HTML form that looks like this:

<td>Username: </td><td>
<input type='text' name='username' value='' size='30'></td>
<td>Password: </td><td>
<input type='password' name='password' value='' size='30'></td>

To login to this form, you’ll need to provide field values that match the form. Your parameters must match the form input names. In this case it’s ‘username’ and ‘password’.

http://my.server.com/login.php?username=siege&password=haha
http://my.server.com/login.php POST username=siege&password=haha

If your entire site requires authentication you can add a login URL to your .siegerc file. If this value is set, siege will access that URL before it does anything. Search your .siegerc file for ‘login-url’. Here’s an example using one of the URLs we constructed above:

login-url = http://my.server.com/login.php POST username=siege&password=haha

After it hits that URL, siege will start running through the list of URLs you created.

Happy hacking.



Garbled Apostrophes And Other Things

Do you have man pages with garbled type? I’m working on a multi-threaded file watcher that searches for patterns in files and executes commands on a match. In order to release it into the wild, I need documentation. That means man pages. So I’m viewing my man pages and I see crap like this: ’-f /path/file’

Those are supposed to be single-quotes, i.e., apostrophes.

For this project, I’m building my man pages from perl PODs with Pod::Man. In case you’d like to do the same, here’s a handy utility for making man pages from perl pods. It converts POD data to *roff.

#!/usr/bin/perl
# A Pod::Man example script
#
use Pod::Man;
my $input = $ARGV[0] or barf();
my $output = $ARGV[1] or barf();

my $parser = Pod::Man->new (release => $VERSION, section => 8);
$parser->parse_from_file ($input, $output);

sub barf() {
  print "usage: $0 <file.pod> <file.1>n";
  exit(1);
}

When I saw the garbled text above, I suspected a problem with my method. It turns out that wasn’t the case at all. The culprit was my character set. My language was set to en_US.UTF-8 but my terminal didn’t support that character set. If you’re having a similar problem, you can check your character set with this command:

$ set | grep -i lang
LANG=en_US.UTF-8

The fix is easy:

export LANG=en_US

Add that to your .profile to make it permanent.



Counting Downloads With Fido

I wanted to illustrate how to use fido with an example. Today we’re going to use it to count software downloads on this site. Exciting! This will be simple since we only have one data source. A few years ago, I move my software from an FTP repository onto this web server. To quantify software downloads, we can simply monitor the http access log.

Here’s our fido configuration for the log file:

/var/log/httpd/access_log {
 rules  = downloads.conf
 action = /usr/local/bin/tally
 log    = syslog
}

This tells fido to monitor the access_log in real time. Its pattern match rules are in a file called downloads.conf When fido finds a match, it will execute a program called tally. Finally, the last directive tells fido to use syslog to log its activity.

In order to understand what we’re looking for, you should take a look at the software repository. It contains multiple versions and helpful links to the latest releases and betas. We want to match them all.

Let’s take a look at our downloads.conf file. Since we didn’t specify a full path to the file, fido knows to look for it under $sysconfdir/etc/fido/rules. If you configured it to use /etc, then the rules are found in /etc/fido/rules/downloads.conf. Here’s the file:

#
# Track and count downloads from the website
SIEGE:  .*siege-.*tar.gz.*
FIDO:   .*fido-.*([rpm]|[tar.gz]).*
WACKY:  .*wackyd-.*tar.gz.*
DICK:   .*dick.*tar.gz.*
SPROXY: .*sproxy-.*tar.gz.*
CONFIG: .*JoeDog-Config.*
STATS:  .*JoeDog-Stats.*
GETOPT: .*php-getopt.*
WACKY:  .*JoeDog-Wacky.*
PBAR:   .*JoeDog-ProgressBar.*

Each line begins with an optional label. If a label is present, fido will pass it (minus the colon) to the action program. In the example above, if the JoeDog-Config perl module is downloaded, then fido will run /usr/local/bin/tally CONFIG. For more on labels, see the fido user’s manual.

Continue reading Counting Downloads With Fido



Invalid command ‘TypesConfig’

Ah but the joys of trying to match the missing module with its obtuse apache error. In this case, we tried to use the TypesConfig directive but the module wasn’t loaded at runtime. Here’s the error:

# service httpd configtest
Syntax error on line 107 of /etc/httpd/conf/httpd.conf:
Invalid command 'TypesConfig', perhaps misspelled or defined by a module
not included in the server configuration

In this case, we were missing the mime module. You can add that module in your httpd.conf file with the following directive:

LoadModule mime_module modules/mod_mime.so

Happy apaching!



Newlines In WordPress

Did you ever want to add a new line to a WordPress entry but it gives you a new paragraph? Instead of this:

– haha
– papa
– mama

You get this:

– haha

– papa

– mama

I hate that. It adds extra space between each line. Fortunately, there’s an easy fix. In order to produce the first list without spaces between each line, just hold the shift key while you hit return.



Invalid command ‘order’

It would be nice if apache told you which module you were missing. Fortunately, there’s the Internets! Hey, this site is on the Internets let’s see if we can help. I just ran ‘service httpd checkconfig’ and received the following error:

# service httpd configtest
Syntax error on line 92 of /etc/httpd/conf/httpd.conf:
Invalid command 'Order', perhaps misspelled or defined by a module 
not included in the server configuration

After a brute force attempt at adding modules, it became clear that I was missing the following module: authz_host_module. I added that in httpd.conf with the following directive:

LoadModule authz_host_module modules/mod_authz_host.so

You can also compile that module into the binary with the following flag: –enable-authz-host  (in most cases that’s compiled by default but I’m using RedHate’s binary so it was necessary to add it at run time).



Helpful Perl Functions

The following pair of functions are ones that I use often. As far as I’m concerned, they should be included in perl. This post serves as both a personal place holder and an opportunity to share with the Internets. Chances are you found them at the sweet end of a Google search.

Method: trim
Params: $string
Return:  $str
Usage:   $str = trim($str);

# This function trims white space from the
# front and back of parameter $string.
sub trim() {
  my $thing = shift;
  $thing =~ s/#.*$//; # trim trailing comments
  $thing =~ s/^s+//; # trim leading whitespace
  $thing =~ s/s+$//; # trim trailing whitespace
  return $thing;
}
# Or use this function for perl 5.10 and higher
sub trim {                                                                                                              
  (my $s = $_[0]) =~ s/^\s+|\s+$//g;                                                                                    
  return $s;                                                                                                            
}                                                                                                                       

 

Php offers a useful utility function called ’empty’ which determines whether or not a variable is, well, empty. Here’s the equivalent function is perl:

Method: empty
Params: $string
Returns: boolean
Usage:    if (!empty($string)) { print “Whoo hoo!”; }

sub empty { ! defined $_[0] || ! length $_[0] }

 

I often use timestamps as unique identifiers or in transaction logging. The Internets are full of perl modules that provide timestamp functionality but I generally prefer to roll my own. Why? Mainly for portability. If a script relies on the basic perl library, then it runs on any server with perl installed.

Method: timestamp
Params: none
Returns: $string
Usage:    print timestamp() . “n”;

# returns a string in the following format:
# YYYYMMDDHHMMSS
sub timestamp() {
  my $now   = time;
  my @date  = localtime $now;
  $date[5] += 1900;
  $date[4] += 1;
  my $stamp = sprintf(
    "%04d%02d%02d%02d%02d",
     $date[5],$date[4],$date[3], $date[2], $date[1], $date[0]
  );
  return $stamp;
}

NOTE: The above function was corrected to include seconds.