Google Analytics

Search

To search for specific articles you can use advanced Google features. Go to www.google.com and enter "site:darrellgrainger.blogspot.com" before your search terms, e.g.

site:darrellgrainger.blogspot.com CSS selectors

will search for "CSS selectors" but only on my site.


Showing posts with label UNIX. Show all posts
Showing posts with label UNIX. Show all posts

Monday, October 29, 2012

Generating a file of a specific size

Every once in a while someone is looking for a file of a specific size. Occasionally, it must be real data. If you are transmitting the file and the data will be compressed then the type of data will make a difference.

However, if you just need a file to fill some space or there will be no compression then Windows has a neat little utility called FSUTIL.

The FSUTIL file can be used for a number of things but the nicest feature is creating a new file filled with zero bytes. First, you need to know how many bytes. If you want a file which is 38 gigabytes then you need to figure out how many bytes that is. Technically, it is 1024*1024*1024*38. If you want  a rough idea you can just use 38,000,000,000 but a 38 gigabyte file is really 40,802,189,312. Next is which file you want to hold the data. Let's say you want to create C:\DELETEME.TXT then the full FSUTIL command is:

fsutil file createnew C:\DELETEME.TXT 40802189312

This will create a 38G file is a matter of seconds.

For UNIX, Mac OS X or Linux you can use DD. The DD command is for converting and copying files. If we wanted to create the 38G file with DD the command would be:

dd of=deleteme.txt oseek=79691776

The oseek is the magic. It will seek n*512 bytes. The standard size of a block is 512. So we take 79691776*512, which is 38G. Even easier would be:

dd of=deleteme.txt oseek=38m obs=1024

This will generate a file of 38M * 1024 or 38G. Much easier to figure things on working with values like these. That is, no need for a calculator.

IMPORTANT: after you press enter on the DD command it will take the standard input as its input. So you need to enter CONTROL-D.

The other option for UNIX/Linux/Mac OS X is to copy a file of a set size. The nice thing about this is you can take a file with real data and copy it enough times to make a single file of the correct size. For example, if I have a text file with 512 bytes of real data I can set the if= to that file and make multiple copies of that one file into the output file.

Tuesday, January 17, 2012

Keeping data and code in one shell script file

In the past I have written scripts which required a lot of data. I could have the script read a second file with all the data in it but occasionally I'd lose the data file and still have the script. Or I'd have dozens of scripts and data files but no idea which data went with which script.

Solution: put the script and the data into one file. The script could then read the data from itself. Here is an example script:

#!/bin/sh

for entry in $(sed -n -e '/^exit/,$ p' $0 | grep -v exit | sed -e '/^$/d' | sed -e '/^#.*/d' | sed -e 's/ /_/g'); do
        entry=$(echo $entry | sed -e 's/_/ /g')
        firstname=$(echo $entry | awk -F, '{print $1}')
        lastname=$(echo $entry | awk -F, '{print $2}')
        number=$(echo $entry | awk -F, '{print $3}')
        echo "$firstname $lastname's phone number is $number"
done
exit

#FirstName,LastName,Phone
Darrell,Grainger,(416) 555-1212
John,Doe,(323) 555-1234
Jessica,Alba,(909) 555-9999

In this example, the data is a list of comma separated fields. Let's examine the list in the for statement. The $0 is the file currently executing, i.e. the file with this script and the data.

The sed command prints everything from the line which starts with exit to the end of file. The grep command gets rid of the line which starts with exit. The next sed command discards all blank lines. The third sed command discards all lines which start with #. This allows us to start a line with # if we want to add comments or comment out a line of data.

The final sed command on the for statement replaces all spaces with underscores. The reason for this is because if I have a line with a space, the for statement will process it as two separate records. I don't want that. I want to read one line as one record.

Inside the body of the for loop, the first line converts all the underscores back to spaces. If you want to have underscore in your data, this will not work. The solution is to pick a character which is not part of your data set. You can pick anything so long as the character you pick is the same in the for statement and the first line of the for loop body. The g in the sed statement is important in case there is more than one space.

The next three lines show how to break the line apart by commas. If you need to use commas in your data then pick a different character to separate the fields. The -F switch in the awk statement sets the field separator. So if you use exclamation mark as a field separator, you need to change the awk statement to -F!.

The echo statement is just an example of using the data.

Wednesday, December 14, 2011

Using the right tool for the job

From time to time I see people asking questions about how to use an automation tool to do something the tool was never meant to do. For example, how do I use Selenium to get the web page for a site without loading the javascript or CSS?

Selenium is designed to simulate a user browsing a website. When I open a web page with a browser, the website sends me javascript and CSS files. The browser just naturally processes those. If I don't want that, I shouldn't use a browser. If I am not using a browser, why would I use Selenium to send the HTTP request?

That is all the get() method in Selenium does. It opens a connection to the website and sends an HTTP request using the web browser. The website sends back an HTTP response and the browser processes it.

If all I want to do is send the request and get the response back, unprocessed. I don't need a web browser.

So how can I send an HTTP request and get the HTTP response back? There are a number of tools to do this.

Fiddler2: http://www.fiddler2.com/fiddler2/

The way Fiddler works is you add a proxy to your web browser (actually Fiddler does it automatically). Now when you use the web browser, if Fiddler is running, the web browser sends the HTTP request to Fiddler and Fiddler records the request and passes it on to the intended website. The website sends the response back to Fiddler and Fiddler passes it back to the web browser.

You can save the request/response pair and play them back. Before you play the request back you can get it. You can edit the website address, you can edit the context root of the URL and if there is POST data you can get the data as well.

Charles: http://www.charlesproxy.com/

Charles is much like Fiddler2 but there are two main differences. The first is that Charles is not free. You can get an evaluation copy of Charles but ultimately, you need to pay for it. So why would you use Charles? With purchase comes support. If there are things not working (SSL decrypting for example) you can get help with that. Additionally, Fiddler is only available on Windows. Charles works on Mac OS X and Linux as well.

curl: http://curl.haxx.se/

Fiddler and Charles are GUI applications with menus and dialogs. They are intended for interacting with humans. If you are more of a script writer or want something you can add to an automated test, you want something you can run from the command line. That would be curl. Because it is lightweight and command line driven, I can run curl commands over and over again. I can even use it crude for load testing.

The most common place to find curl is checking the contents of a web page or that a website is up and running. There are many command line options (-d to pass POST data, -k to ignore certificate errors, etc.) but the general use is curl -o output.txt http://your.website.com/some/context/root. This will send the HTTP request for /some/context/root to the website your.website.com. A more real example would be:

curl -o output.txt http://www.google.ca/search?q=curl

I could then use another command line tool to parse the output.txt file. Or I could use piping to pipe the output to another program.


Another nice command line tool is wget. The wget command, like curl, will let you send an HTTP request. The nice thing about wget is that you can use it to crawl an entire website. One of my favourite wget commands is:

wget -t 1 -nc -S --ignore-case -x -r -l 999 -k -p http://your.website.com

The -t sets the number of tries. I always figure if they don't send it to me on the first try they probably won't send it to me ever. The -nc is for 'no clobber'. If there are two files sent with the same name, it will write the first file using the full name and the second file with a .1 on the end. You might wonder, how could it have the same file twice in the same directory? The answer is UNIX versus Windows. On a UNIX system there might be index.html and INDEX.html. To UNIX these are different files but downloading it to Windows I need to treat these as the same file. The -S prints the server reponse header to stderr. It doesn't get saved to the files but lets me see that things are still going and something is being sent back. The --ignore-case option is because Windows ignores case so we should as well. The -x option forces the creation of directories. This will create a directory structure similar to the original website. This is important because two different directories on the server might have the same file name and we want to preserve that. The -r option is for recursive. Keep going down into subdirectories. The -l option is for the number of levels to recurse. If you don't specify it, the default is 5. The -k option is for converting links. If there are links in the pages being downloaded, they get converted. Relative links like src="../../index.html" will be fine. But if they hard coded something like src="http://your.website.com/foo.html" we want to convert this to a file:// rather than go back to the original website. Finally, the -p option says to get entire pages. If the HTML page we retrieve needs other things like CSS files, javascript, images, etc. then the -p option will retrieve them as well.

These are just some of the tools I use when Selenium is not the right tool for the job.

Thursday, July 12, 2007

Bourne shell scripting made easy

Someone was having trouble writing a shell script. A common activity for Bourne shell scripting is to take the output from various commands and use it as the input for other commands. Case in point, we have a server that monitors clients. Whenever we get new monitoring software we have to use the server command line tool to install the cartridge on the server, create the agents for each client, deploy the agents, configure them and activate them.

The general steps are:

1) get a list of the cartridges (ls)
2) using a tool, install them (tool.sh)
3) using the same, tool get a list of agents
4) using the tool, get a list of clients
5) using the tool, for each client create an instance of each agent
6) using the tool, for each agent created deploy to the client
7) using the tool, configure the agents
8) using the tool, activate the agents

Just looking at the first two steps, if I was doing this by hand I would use ls to get a list of all the cartridges. I would then cut and paste the cartridge named into a command to install them.

So a Bourne shell script should just cut the same things out of the ls list.

If the cartridge files all end with the extension .cart I can use:
ls -1 *.cart

If the command to install a cartridge was:
./tool.sh --install_cart [cartridge_name]

I could use:
for c in `ls -1 *.cart`; do
./tool.sh --install_cart $c
done

This is pretty easy and straight forward. What if the command was not as clean as ls? What is the list of agents was something like:
./tool.sh --list_agents
OS: Linux, Level: 2.4, Version: 3.8, Name: Disk
OS: Linux, Level: 2.4, Version: 3.8, Name: Kernel
OS: Windows, Level: 5.1, Version: 3.8, Name: System

To install the agent I only need the Name. If I only wanted the Linux agents, how would I get just the Name? First, you want to narrow it down to the lines you want:
./tool.sh --list_agents | grep "OS: Linux"

This will remove all the other agents from the list and give me:
OS: Linux, Level: 2.4, Version: 3.8, Name: Disk
OS: Linux, Level: 2.4, Version: 3.8, Name: Kernel

Now I need to parse each line. If I use the above command in a for loop I can start with:
for a in `./tool.sh --list_agents | grep "OS: Linux"`; do
echo $a
done

Now I can try adding to the backtick command to narrow things down. The two ways I like to parse a line is using awk or cut. For cut I could use:
for a in `./tool.sh --list_agents | grep "OS: Linux" | cut -d: -f5`; do
echo $a
done

This will break the line at the colon. The cut on the first line would give the fields:
  1. OS
  2. Linux, Level
  3. 2.4, Version
  4. 3.8, Name
  5. Disk

The problem is there is a space in front of Disk. I can add a cut -b2-, which will give me from character 2 to the end, i.e. cut off the first character. What if there is more than one space? This is why I like to use awk. For awk it would be:
for a in `./tool.sh --list_agents | grep "OS: Linux" | awk '{print $8}'`; do
echo $a
done

For awk the fields would become:
  1. OS:
  2. Linux,
  3. Level:
  4. 2.4,
  5. Version:
  6. 3.8,
  7. Name:
  8. Disk

The spaces would not be an issue.

So by using backticks, piping and grep I can break things apart into just the lines I want. Piping the result of grep to cut or awk to break the line apart and keep just the bits I want.

The only other command I like to use for parsing output like this is sed. I can use sed for things like:
cat file | sed -e '/^$/d'

The // is a regex pattern. The ^ means beginning of line. The $ means end of line. So ^$ would be a blank line. The d is for delete. This will delete blank lines.

Actually, lets give an example usage. I want to list all files in a given directory plus all subdirectories. I want the file size for each file. The ls -lR will give me a listing like:
.:
total 4
drwxrwxrwx+ 2 Darrell None   0 Apr 19 14:56 ListCarFiles
drwxr-xr-x+ 2 Darrell None   0 May  7 21:58 bin
-rw-rw-rw-  1 Darrell None 631 Oct 17  2006 cvsroots

./ListCarFiles:
total 8
-rwxrwxrwx 1 Darrell None 2158 Mar 30 22:37 ListCarFiles.class
-rwxrwxrwx 1 Darrell None 1929 Mar 31 09:09 ListCarFiles.java

./bin:
total 4
-rwxr-xr-x 1 Darrell None 823 May  7 21:58 ps-p.sh

To get rid of the blank likes I can use the sed -e '/^$/d'. To get rid of the path information I can use grep -v ":", assuming there are no colons in the filenames. To get rid of the directories I can use sed -e '/^d/d' because all directory lines start with a 'd'. So the whole thing looks like:
ls -lR | sed -e '/^$/d' -e '/^d/d' | grep -v ":"

But there is actually an easier answer. Rather than cutting out what I don't want, I can use sed to keep what I do want. The sed -n command will output nothing BUT if the script has a 'p' command it will print that. So I want to sed -n with the right 'p' commands. Here is the solution:
ls -lR | sed -n -e '/^-/p'

This is because all the files have '-' at the start of the line. This will output:
-rw-rw-rw-  1 Darrell None 631 Oct 17  2006 cvsroots
-rwxrwxrwx 1 Darrell None 2158 Mar 30 22:37 ListCarFiles.class
-rwxrwxrwx 1 Darrell None 1929 Mar 31 09:09 ListCarFiles.java
-rwxr-xr-x 1 Darrell None 823 May  7 21:58 ps-p.sh

I can now use awk to cut the file size out, i.e. awk '{print $5}'. So the whole command becomes:
ls -lR | sed -n -e '/^-/p' | awk '{print $5}'

If I want to add all the file sizes for a total I can use:
TOTAL=0
for fs in `ls -lR | sed -n -e '/^-/p' | awk '{print $5}'`; do:
TOTAL=`expr $TOTAL + $fs`
done
echo $TOTAL

The expr will let me do simple integer match with the output.


NOTE: you use use man to learn more about the various commands I've shown here:

  • man grep
  • man cut
  • man awk
  • man sed
  • man regex
  • man expr

The sed and awk commands are actually powerful enough to have entire chapters written on them. But the man page will get you started.

While you are at it, do a man man.

Enjoy!

Wednesday, April 25, 2007

Identifying UNIX versions

I work in an environment with numerous different versions of UNIX and Linux. Sometimes I'll be accessing multiple machines from my workstation. Occasionally, I need to confirm the OS for the current terminal. The way to determine which version of UNIX you are using is with:
uname -a

For Solaris you would get something like:
SunOS rd-r220-01 5.8 Generic_117350-26 sun4u sparc SUNW,Ultra-60

For HP-UX you would get something like:
HP-UX l2000-cs B.11.11 U 9000/800 158901567 unlimited-user license
or
HP-UX rdhpux04 B.11.23 U ia64 0216397005 unlimited-user license

For AIX you would get something like:
AIX rd-aix09 2 5 00017F8A4C00

From this it is a little harder to see the version. It is actually AIX 5.2. If you check the man page for uname it will help you decode the hexidecimal number at the end. This will tell you things like 4C is the model ID and the 00 is the submodel ID. Additionally, AIX uses other switches to tell you about things the -a normally gives you on other platforms. For example,
uname -p # the processor architecture
uname -M # the model

For Linux things are a little tricker. The uname -a will tell you it is Linux but it will not tell you if it is SuSE Linux Enterprise Server (SLES) 10.0, Redhat AS 5.0, et cetera. To figure this out, look for a text file in /etc/ which ends in 'release', i.e.
cat /etc/*release

This text file will tell you which distribution of Linux you are using.

Friday, March 23, 2007

Named pipes

If you are familiar with UNIX, you are familiar with pipes. For example, I can do:
ps -ef | sort | more

The ps command will output a list of all processes to stdout. Normally, this would be to the console window. The pipe (|) will tell UNIX to take the output of ps and make it the input to sort. Then the output from sort will become the input to more.

Without using pipes I could do:
ps -ef > temp_file1
sort < temp_file1 > temp_file2
rm temp_file1
more temp_file2
rm temp_file2

This is like using the pipe but instead we put the output of ps into temp_file1. Then we use temp_file1 as the input to sort and send the output to temp_file2. Finally, we use temp_file2 as the input to more. You should be able to see how this is a lot like the first example using pipes.

Now here is a third way using Named Pipes. To create a named pipe use:
mkfifo temp_file1

If you list this entry using ls -l you will see something like:
prw-r--r--   1 dgrainge staff          0 Mar 23 08:13 stdout

Notice the first letter is not - for a file or even d for a directory. It is p for a named pipe. Also the size of the 'file' is 0. We will need two shells to do this.
# shell 1
mkfifo temp_pipe1
mkfifo temp_pipe2
ps -ef > temp_pipe1            # this will block so switch to shell 2

# shell 2
sort < temp_pipe1 > temp_pipe2 # this will block so switch back to shell 1

# shell 1
more temp_pipe2
rm temp_pipe1
rm temp_pipe2

The interesting thing about this example is that we needed two shells to do this. At first this might seem like a downside but the truth is, this is a positive. I can do something like:
mkfifo stdout
mkfifo stderr

# shell 2
more stdout

# shell 3
more stderr

# shell 1
sh -x some_script.sh 1> stdout 2> stderr

The -x will turn on trace. Debug information will be output to stderr. By using the named pipes, I can redirect the regular output to shell 2 and the debug information to shell 3.

Thursday, March 8, 2007

Extracting part of a log using Bourne shell

Someone recently asked me how to select a range of text from a log file. Because it was a log file, each line started with the date and time for each log entry.

She wanted to extract all the log entries from a start time to an end time. For example, all log entries from 08:07 to 08:16 on March 8th, 2007. The format for the timestamp would be:
2007-03-08 08:07:ss.sss [log message]

where ss.sss was the seconds and [log message] was the actual text message written to the log.

My solution, using Bourne shell, was to determine the first occurance of "2007-03-08 08:07" using grep. The GNU grep command would be:
START=`grep -n -m1 "2007-03-08 08:07" logfile.log | cut -d: -f1`

The -n will prefix the results with the line number. The -m1 tells it to quit after the first match. The output is going to be something like:
237:2007-03-08 08:07:ss.sss [log message]

where 237 is the line number. So the cut -d: will break the line at the semicolons and the -f1 will take the first field, i.e. 237.

Next you want to find the last occurance of 08:16. I would suggest looking for 08:17 using the same grep command, e.g.
END=`grep -n -m1 "2007-03-08 08:17" logfile.log | cut -d: -f1`


The reason you want to look for the value after the real END time is because a log might have many entries for 08:16. By looking for 08:17 we know we have captured all the entries for 08:16 rather than just the first entry.

This will give us the line AFTER the line we want, so we do the following to decrement it by one:
END=`expr $END - 1`

Now we want to extract everything from START to END in the log. We start by extracting everything from 1 to the END using the head command:
head -n $END logfile.log

Now we want to trim off the first START lines from this. For that we can use the tail command. But the tail command wants to know how many lines are to be kept. The value of START is the number of lines we want to get rid of. So we really want $END - $START + 1. So:
LINES=`expr $END - $START + 1`

Finally we would have:
head -n $END logfile.log | tail -n $LINES

and this will display only the lines from 08:07 to 08:16 on March 8, 2007.

Tuesday, January 9, 2007

Making shell scripts atomic

What do you do if you have a shell script that cannot be run twice at the same time, i.e. you have to wait until the script finishes before you can run it a second time.

The solution is to make the script check to see if it is running. If it is not running then let it start. The problem with this is it is possible to have the following scenario:

- start the script
- it checks to see if it is running
- start the script running a second time
- the second script checks to see if it is running
- flag that the first script is running and enter the critical section
- flag that the second script is running and enter the critical section

In other words, the check and the setting of the flag have to be atomic, i.e. you cannot have them be two steps in the script.

The solution is to use ln to create a link. The link will be the setting of the flag and the check. Then you can check the status of the ln command (the flag).

So here is the code to do it:
# Bourne Shell
#!/bin/sh

# create a filename for the ln
LOCK=`echo $0 | awk -F/ '{print $NF}' | awk -F. '{print $1}'`.LOCK

# create the link
ln $0 ${LOCK} 2> /dev/null

# see if we are already running
if [ $? -eq "0" ]; then
echo "running atomic script here"
echo "it just sleeps for 5 seconds"
sleep 5
/bin/rm ${LOCK}
else
echo "script is already running"
fi


Or if you perfer the C shell:
#C shell
#!/bin/csh

# create a filename for the ln
set LOCK=${0:r}.LOCK

# create the link
ln $0 ${LOCK} >& /dev/null

# see if we are already running
if ( ! $status ) then
echo "running atomic script here"
echo "it just sleeps for 5 seconds"
sleep 5
/bin/rm ${LOCK}
else
echo "script is already running"
endif

Tuesday, December 26, 2006

System V versus BSD UNIX

Now a days you will most likely be exposed to AT&T System V UNIX. Years ago there were dozens of different 'versions' of UNIX. At some point things started to converge to either AT&T System V (pronounced A T and T System Five) or BSD UNIX.

If you go for a job interview they might ask you things like, "How do you list all processes?" If you are using System V then the answer is "ps -ef" but it is "ps -aux" if you are using BSD UNIX.

Additionally, a lot of the System V installations have the BSD implementations of commands as well. The idea is that I might have a script that assumes BSD format commands. If I was to parse the output of ps to do a mass kill of all processes and its children then the format of the ps command matters.

If the script was written assuming the BSD version of ps and my computer was configured for System V, the script will fail. The solution, the default path might be /usr/bin but the BSD version of the commands are stored in /usr/ucb. So I could just edit the script so it finds the /usr/ucb/ps command rather than the default /usr/bin/ps.

Finally, if you are using MacOS X you are using a GUI front end to an implementation of UNIX. The UNIX is BSD UNIX. So things like "ps -ef" will not work.

So if you want to impress an interviewer who asks how to list all the processes on a UNIX machine, the answer is "If your computer is configured for System V UNIX then ps -ef will do the trick, but if /usr/ucb exists in the path before /usr/bin then ps -aux will do the trick. Additionally, if you are using MacOS X, which is BSD UNIX, then the default path will use ps -aux."

There are some obvious wrinkles to this. I'm assuming the default set up and configuration for the system. You'd impress them more if you asked which verison of UNIX and then noted, "if they are using the common configuration..."

If you want to read more of the details behind the history of UNIX, give this web site a try. Additionally, this site talks about creation of different versions of UNIX. Click on the picture at the top to see how nuts it got.

Sunday, December 24, 2006

Debugging Bourne shell scripts

When most people start programming Bourne shell scripts they are VERY small. If you take over someone else's work or you have been programming shell scripts for any significant length of time, you will find something wrong in a script of thousands of lines.

Originally you might have put a few echo statements to see what was going wrong with a script. With thousands of scripts, scripts calling scripts, aliases set to scripts, system commands written as scripts, etc. it can get quite difficult to find a bug using echo statements.

The solution is to run the script with the -x flag. If you have the script starting with:

#!/bin/sh

then you can edit it to be:

#!/bin/sh -x

Or better yet, if the script is run using:

./myscript.sh

change it by using:

sh -x ./myscript.sh

Running the script with this option will let you see each line as it is executed. You will also see variables getting expanded. With this output the echo statements should be unnecessary. Try running a small script with this option and see what it outputs.