Fun with netcat

Ever heard of nc? It’s a simple utility that is able to connect to a remote host via TCP or UDP socket and send data. Basically it’s a command line interface to the BSD socket API. The manual page says

The nc (or netcat) utility is used for just about anything under the sun involving TCP, UDP, or UNIX-domain sockets.  It can open TCP connections, send UDP packets, listen on arbitrary TCP and UDP ports, do port scanning, and deal with both IPv4 and IPv6.

And why am I writing a whole post about this? Because it’s insanely powerful tool for development of network apps. It can be used to try out what you need to pass to the socket in order to get some data. RFC‘s are good, but when it comes to basic understanding of some protocol they’re pretty useless. I mean who wants to read a 60+ page RFC full of technical implementation details. These are also pretty hard to understand sometimes. With netcat, you can try contacting the server yourself first with your keyboard! This is the most amazing part, you simply write the messages that your program needs to send. For example if you’re programming an http client you can write your own request like this

astro@desktop:~$ nc www.google.cz 80
GET /
HTTP/1.0 302 Found
Location: http://www.google.cz/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Date: Sun, 18 Sep 2011 21:32:24 GMT
Server: gws
Content-Length: 218
X-XSS-Protection: 1; mode=block

The only thing you need to write is GET / and the server will respond to your http GET request. This is much more convenient than reading hundreds of RFC’s before you start even thinking about programming an http client. Other than that it works exactly the same way as cat (and no, not the furry one). Everything you pass to it, it will pass along to the remote server.

This way you can test almost all text protocols that work over TCP or UDP. Just recently I did a pop3client and netcat helped me to track some bugs as well.

Sources

Advertisements

Test Driven Development

Another book from my huge TOREAD pile is Test Driven Development: By Example from Kent Beck. I learned about this method of development from the Extreme Programming book (also from Kent Beck) and I tried to take advantage of it during the coding phase of my thesis for bachelor’s. It’s a great way to develop software! Having your software covered by unit tests, you are way more confident with it. Along with this comes an assurance, that you didn’t break some part of your software when you add or change something. Without proper testing (either regression or unit) you just try stuff and see what happens. And it’s usually accompanied by glass shattering sounds and echoes of screaming people.

There is a metaphor (according to Steve McConnell in Code Complete) for software development that describes the process as drowning in tar pits with dinosaurs. I was a bit skeptical towards this metaphor at first, but it’s damn accurate when you code, but don’t test.

TDD Book Cover

Test Driven Development: By Example book cover

What exactly can you find in the book? In the first hundred pages, Mr. Beck explains test driven development on a case study of  WyCash — some software that handles money. It’s a step-by-step (and by step I mean really small steps) guide through the whole process. To be honest, I didn’t like this part of the book. It explains how exactly TDD should be done, but it’s sooo annoying to read about copying methods from one place to another and replacing return 5; by return x+y;.

The second part gets a little more interesting. It’s about xUnit — the family of widely used frameworks for unit testing (sUnit for Smalltalk, jUnit for Java, CppUnit for C++ etc). In this part, you will learn how the framework works with test cases, test suits and fixtures, the setUp() and tearDown() methods etc. Kent Beck is actually the original author of sUnit, the first framework from this family, so all information you get here comes directly from the source. He actually explains how to implement such a framework using TDD method.

And the last part covers TDD method in general, answers some questions that might spring to mind, usage of design patterns together with TDD and explains some situations you might find yourself in while using test driven development method.

RedGreen-Refactor

I’d like to point out one last principle — the Red-Green-Refactor. It’s a sort of mantra that will guide you through the whole book. It explains pretty much the whole routine of TDD in three steps (but you have to read the book to understand it properly!).

  1. Write a test — a test for some new functionality, that will obviously fail (hence the red sign)
  2. Make it work — write as little code as possible to make the test execute correctly (copy some code, fake the implementation, whatever, just make it work, turn the red to green)
  3. Refactor — at this point, the functionality is already done, so let’s focus only on the quality of design and implementation

It’s surprisingly easy, but extremely powerful, if you think about that in broader terms. I definitely recommend this book, maybe along with the Extreme Programming from the same author.

Brief History of Time

I finished all my exams a little early this term (and thank god for that!), so I could dive right into the huge pile of book that always emerges on my desk through the semester. The first one was Brief History of Time from Stephen Hawking which is a very interesting book.

Mr. Hawking talks about some interesting topics ranging from the very foundation of physics, it’s history and how our understanding of the universe evolved over time. There was Aristotle, who thought, that everything consists of the five  elements (earth, water, fire, air and aether – a divine substance that makes up the stars and planets). Then came Sir Isaac Newton with his laws and gravitation, Albert Einstein with theory of general relativity. Now we have quantum mechanics and string theory, that are based on the wave-particle duality.

The physicists went through all this in search of a grand unified theory, a theory of everything, one concept that will define the whole universe. The thing is, we’re not there yet and it might take some time. There are still things, that cannot be accurately described by the current principles of quantum mechanics or string theory.

The book also explores black holes and their role the big bang (as the beginning of the universe). I must admit, that I couldn’t understand it all when I read the book for the first time. Shame on me, I’ll have to give it a shot another time.

(In)finite Space-time

Another interesting thing, that was introduced in the book is a concept of a finite space with no beginning or end and how to imagine such a situation. You see, in life, everything has a beginning and everything, at some point eventually comes to an end. So it comes very hard to us to think in terms of a space that is not infinitely large, but has no beginning nor ending. Well, the example is right there! We live on it! Take for instance the surface of planet Earth. It’s definitely not infinitely large, but have you seen an end of the world somewhere? 2D surface of a 3D sphere is an example of space of finite size, but with no distinguished points of beginning or end. But how to extend it by one coodrinate to 3D/4D? Can we apply this principle to the space-time continuum or the time itself? Does it mean that our universe doesn’t neccessarily have a begining or end, that it just is?

Entropy

Another interesting principle concerns closed systems’ entropy. According to the second law of thermodynamics, entropy of an isolated system increases over time, it never decreases. Sounds familiar? Well, in case you’re a software engineer, I’m sure it does :-). By this law, code degrades and we cannot do anything about it — it’s physics! Unfortunatelly, for all spagheti-code producers, enthropy can also be reduced by increasing entropy somewhere else e.g. (surprisingly) putting some work to the system ;-).

It’s good!

There’s a lot more interesting theorems, thoughts and principles from physics, philosophy and many others areas of human knowledge. I think it’s definitely worth reading. Give it a shot!

Brief History of Time book cover

Book Cover

DRY Principle

I read a couple of books on software development lately and I stumbled upon some more principles of software design that I want to talk about. And the first and probably the most important one is this:

Don’t repeat yourself.

Well, this is new … I mean as soon as any programmer learns about functions and procedures, he knows, that it is way better to split things up into smaller reusable pieces. The thing is, this principle should be used in much broader terms. As in NEVER EVER EVER repeat any information in a software project.

The long version of the DRY principle, which was authored by Andy Hunt and David Thomas states

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

The key is, that it doesn’t apply only to the code. Every single time you have something stored in two places simultaneously, you can be almost certain, that it will cause pain at some point in the future of your project. It will sneak up on you from behind and hit you with a baseball bat. And then keep kicking you while you’re down. This is one of those cases in which foretelling future works damn well.

Authors of The Pragmatic Programmer show this on a great example with database schemes. At one point in your project you make up a database scheme, usually on a paper. Then you store it with your project somewhere in plain text or something. The people responsible for writing code will look in the file and create database scripts for creating the database and start putting various queries in the code.

What happened now? You have two definitions of the database scheme in your system. One in the text file and another as a database script. After while, the customer shows up and demands some additional functionality, that requires altering the scheme. Well, that shouldn’t be that much of a problem. You simply change the database script, alter the class that handles queries and go get some lunch. Everything works fine, but after a year or two, you might want to change the scheme a bit further. But you don’t remember a thing about the project, so you will probably want to look at the design first, to catch up. Or you hire someone new, who will look on the scheme definition. And it will be wrong.

Storing something multiple times is painful for a number of reasons. First, you have to sync changes between the representations. When you change the scheme, you have to change the design too and vice versa. That’s an extra work, right?  And as soon as you forget to alter both, it’s a problem.

The solution to this particular problem is code generation. You can create the database definition and a very simple script, that will turn it into the database script. Here’s a wonderful illustration (by Bruno Oliveira) of how that works :).

Repetitive tasks figure

Sources

New Domain linuxwell.com!

I like having this little website and I enjoy writing new posts. So when I noticed, that linuxwell.com is still available I thought, “Hey, let’s get it!” The transfer went through this morning, so the blog is now available primarily from http://www.linuxwell.com. The old third-level domain on wordpress.com will still work, but it’s better to use the new one now :-).

nVidia CUDA on Linux (Fedora 15)

I recently updated my desktop with a new graphic card from nVidia — a pretty low-end GT440. It’s actually my first nVidia card. I owned three ATI Radeon cards (Sapphire 9550, 9600Pro and HD2600XT) before. The main reason for this was, that I needed CUDA enabled card in order to be able to work on my school project (I study computer graphics). On top of that, Gnome 3 has some issues with ATI drivers and the HD2600XT is like 5 years old today :-P. Anyway, I recently installed the nVidia CUDA SDK on Fedora 15 and I made a little howto along the way.

Installing gcc

One of the biggest issues in with CUDA is that it officially supports only gcc of version 4.4 and earlier. That’s a bummer, especially on Fedora, where there is gcc-4.6 by default and you need to compile the older version yourself. I posted series of steps to compile gcc from sources in a separate article Multiple Versions of gcc on Fedora 15.

Drivers

There is something called Developer Drivers for Linux available from the nVidia website. I don’t know what it is or how is it different from the original drivers, but I couldn’t make them work. The installation kept dying with some error with kernel headers.

Good news is, that you don’t need them. At least I was able to compile and run the CUDA SDK without installing these only with the standard nVidia proprietary driver (version 285.05.09). If you’re running them, you’re fine. In case you have the noveau open-source drivers you need to get rid of them and install the nVidia proprietary driver.

There is one little thing you need to do if you chose not to install the developer drivers: make a symlink for libcuda.so like this

su
cd /usr/lib
ln -s nvidia/libcuda.so libcuda.so

CUDA Toolkit

This is pretty straight-forward. Download and run shell script from the nVidia download page. Choose what’s closest to your distro. In this case it’s Fedora 13. Fedora 15 is officially unsupported due to the lack of old gcc. But we have sorted this earlier, so it’s fine to use the F13 pack.

wget http://www.nvidia.com/object/thankyou.html?url=/compute/cuda/4_0/toolkit/cudatoolkit_4.0.17_linux_32_fedora13.run
sudo sh cudatoolkit_4.0.17_linux_32_fedora13.run

In case you’re prompted with question where to install the toolkit, just hitto keep the defaults.

GPU Computing SDK

Installing the SDK is as easy as the toolkit. Download and run the script from nVidia homepage. This time you won’t need root permissions, because it will install into your home folder.

Setting System Path

The tricky part is here, now you need to make sure all  the libraries and tools are present in the correct environment variables. Edit your ~/.bashrc file and add the following:

# CUDA
export PATH="$PATH:/usr/local/cuda/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib"

Reopen your shell and you should be able to run the nvidia C compiler. Make sure by trying this

 nvcc --version

The last step is trying to compile all the examples in the SDK. Change to that directory and simply run make.

 cd /NVIDIA_GPU_Computing_SDK
 make

Everything should be working. There will be a lot of warnings during the compilation, but we just have to ignore that …

Sources

Multiple Versions of gcc on Fedora 15

This post describes a way of building and installing additional versions of GNU gcc compiler on Fedora 15. The version shipped with Lovelock is gcc 4.6.1 which can sometimes be too new. Some apps such as nVidia CUDA SDK requires a specific version of compiler for some reason. In this howto I’ll be installing gcc-4.4.6.

Notice: There is an official installation guide that will help you if something doesn’t work (or I forgot to mention something here).

1. Getting the Source

I guess, nobody will be surprised when I say, that the first step is to obtain the sources. Go to http://gcc.gnu.com, pick a mirror and download the version you need. The best option is usually downloading the package with all languages included (e.g. gcc-4.4.6.tar.gz). We will unpack the archive and rename the directory to source/. It is also recommended to build the binaries into a separate directory from the sources, so we’ll create a build/ folder too:

 $ tar xzf gcc-4.4.6.tar.gz
 $ mv gcc-4.4.6/ source/
 $ mkdir build/

2. Configure

Now we have everything in place, we need to configure the build. Change into the build/ directory and call the configure script:

 $ cd build/
 $ ../source/configure --prefix=/opt/gcc4.4.6 \
                       --program-suffix=-4.4

As you can see, there are some parameters. --prefix option says where should be the gcc installed after the build. We will install it separately into the /opt directory, so it will be easily removable in the future only by deleting the respective directory. --program-suffix will be added to the name of the executable (in this case gcc-4.4), so it doesn’t collide with other installed versions of gcc on your system.

Warning: Be careful with the --program-suffix option. You might have to make some symlinks to use the compiler in some vendor Makefiles. It will be explained later in this post.

Optionally you can add --enable-languages=c,c++ to choose what languages support will be compiled. For more options, refer to the official guide above.

In case the configure script fails, you probably don’t have all the necessary packages installed to perform the build. See what’s missing and install it with yum. For me the following was enough:

 $ su
 # yum groupinstall "Development tools"
 # yum install mpfr-devel

The full list of prerequisites can be found here.

3. Build

The build is fairly simple, if you see some warnings, feel free to ignore them. To build gcc write the following inside the build/ directory:

 $ make

Now is the time to get some coffee, because it will take a while to build.

4. Installation

Installation is also quite easy. You can install gcc by writing

 $ su
 # make install

Then check whether the installation was correct and everything is in place

 $ /opt/gcc4.4.6/bin/gcc-4.4 --version

5. Using alternate compiler

Using the alternate compiler is a little tricky. You can it to your system path by appending this line to your ~/.bashrc file.

export PATH="/opt/gcc-4.4.6/bin:$PATH"

The alternate compiler is available through gcc-4.4 name (as we defined in the configuration phase). That is cool for your Makefiles, but some vendors hardcode compiler name into their Makefile which doesn’t help at all. If this is your case, the best way to get things to work is to make symlinks to the alternate gcc and append the directory to the begining of $PATH  string. Here are example symlinks for gcc and g++:

su
cd /opt/gcc-4.4.6/bin
ln -s gcc-4.4 gcc
ln -s g++-4.4 g++

Note, that the system look-up in the $PATH folder is sequential, so if you append the directory to the front of the variable, it will have higher priority. If you need to use the newer compiler again, simply remove the directory from path by commenting out the line in ~/.bashrc. Remember, you can always find out version of your current gcc by writing

 $ gcc --version

Sources

Learning Ruby

I always wanted to learn Ruby. It became so popular over the last couple of years and I hear people praise the language everywhere I go. Well, time has come and I cannot postpone this anymore (not with a clear conscience anyway). So I’m finally learning Ruby. I went quickly over the net and also our campus library today, to see what resources are available for the Ruby newbies.

Resources

There is a load of good resources on the internet, so before you run into a bookstore to buy 3000 pages about Ruby, consider starting with the online books and tutorials. You can always buy something on paper later (I personally like stuff the old-fashioned way — on paper more). Here is a list of what I found:

That would be some of the online sources for Ruby beginners. And now something on paper:

  • The Ruby Way by Hal Fulton — Great, but sort-of-a big-ass book. Just don’t go for the Czech translation, it’s horrifying
  • Learning Ruby by Michael Fitzgerald — A little more lightweight, recommend ford bus-size reading!

I personally read The Ruby Way at home and the Learning Ruby when I’m out somewhere. Both of them are good. These are the books that I read (because I could get them in the library). There is a pile of other titles like:

Just pick your own and you can start learning :-).

Installation

Ruby is interpreted language so we will need to install the ruby interpret. On Linux it’s fairly simple, most distributions have ruby packed in their software repositories. On Fedora 15 write sudo yum install ruby. On Debian-based distros sudo apt-get install ruby. If you are Windows user, please, do yourself a favor and try Ubuntu!

To check whether the ruby interpret is available go to terminal and type

 $ ruby --version

Hello Matz!

The only thing that’s missing is the famous Hello World :-)! In Ruby the code looks something like this:

#!/usr/bin/env ruby

puts "Hello Matz!"

Summary

From what I saw during my yesterdays quick tour through Ruby, I can say, that it’s a very interesting language. I’d recommend anyone to give it a shot! I definitely will.

Update: Stay tuned for more! I’m working on a Ruby language cheat sheet right now.

Sources

The Pragmatic Programmer

Another great piece of computer literature I found in our campus’ library! I’m talking about The Pragmatic Programmer by Andy Hunt and David Thomas. And yes, it’s gooood :)!

The Pragmatic Programmer

Figure 1: The Pragmatic Programmer cover

Title of the book (in it’s Czech version) states: “How to become better programmer and create high quality software.” Right? I want that!

It’s a sort-of-a compilation of advice on software development from the practical point of view based on the experience of the authors. A lot of books come with a load of theory which is good too, but when you’re digging through the mounds of formal methods, it’s very easy to forget about the practical side of software development.

The very first chapter of talks about the career of a programmer or a software developer. The authors say to take your career choices as investments in your future. Pragmatic programmer should invest often and into a wide range of technologies. I don’t like the investment metaphor, but I like the thought. Computers train is moving fast and it will run you over at some point if you don’t jump in.

What I liked about this book the most is the emphasis on automation of routine tasks through scripting and the DRY principle. Having good knowlege of the environment and tools you work with is the key in any profession. But programmers (including myself) often tend to focus on what are we doing and on the final results rather than how we do it. And frankly, every time I stop and think what I could do better or automatically, I always find some weak spot.

The process of programming as in actually writing the code should not be overseen as trivial. You can save yourself a lot of stress by being creative in this area. The DRY principle is somewhat connected to this. If you repeat yourself, you not only work ineffectively (you’re doing stuff twice), but you also set a trap for yourself, which you intend to step into later in the project.

Bear trap

Figure 2: Set up for lazy programmers

Overall the book is great and I definitely can recommend it. It’s something over 200 pages or so it shouldn’t take a year to read. It’s also very well written and full of jokes, which makes it fun to read!

 Sources

Myhill-Nerode Theorem in Practice

When it comes to practical side of computer science, we often work with regular and context-free languages. Regular languages are most common for expressing syntax through the widely used regular expressions. Somewhat stronger context-free grammars dominate in the field of compilers and various language processors.

As we know from the Chomsky hierarchy of formal languages, the class of regular languages is a subset (or a sort-of a special case) of context-free language class. This means that every regular language can be described not only by finite automatons, regular expressions or regular grammars but also by context-free grammars, pushdown automatons and also (for the hard-core folks) a turing machine.

In theory this is fine, but it’s not very good for practice. I mean, if you were developing some app that requires syntax checking, would you rather develop some sort of lineary bounded automaton or use a library for regular expressions?

Sometimes we want to prove, that we don’t need to do magic to check the correct syntax of an email address. There are some ways of doing so. For example pumping theorem, which is fairly straight forward. But pumping lemma can be only used to prove, that a language is not regular. The lemma doesn’t provide sufficient condition for a language to be regular. That’s where the folks Myhill and Nerode come in!

The Myhill-Nerode theorem on the contrary provides necessary and sufficient condition for the language to be regular! Yay! But as you can imagine it’s not that simple :-P.

Formal definition

Definition 1. Let Σ be an alphabet and ~ a equivalence relation on Σ*. Then relation ~ is right invariant if for all u, v, w ∈ Σ*:

  • u ~ v <=> uw ~ vw

Definition 2. Let L be a language (not neccessarily regular) over Σ. We define relation ~L called prefix equivalence on Σ* as follows:

  • u ~L v <=> ∀w ∈ Σ*: uw ∈ L <=> vw ∈ L

Theorem 1. (Myhill-Nerode) Let L be a language over Σ. Then these three statements are equivalent:

  1. L is accepted by some deterministic finite automaton.
  2. L is the union of some of the equivalence classes of a right invariant equivalence relation of finite index.
  3. Relation ~L is of finite index.

Informal description

The heart of this theorem is the fact that finite automaton has only a finite number of states. With this, any language that can be described by a finite automaton must consist of only finite number of string patterns. This is expressed by the right invariant relation with finite index.

The prefix equivalence is a form of the right invariant equivalence, but stronger. It’s tied to the respective language and says that if some strings are ~L-equivalent they both are included or excluded from the language after we extend them. The Myhill-Nerode theorem says, that a regular language always has a finite number of equivalence classes, i.e. there is only a finite number of word patterns that can be repeated through the string.

Example proof

Now a little example of how to show, that a language is not regular by using this theorem. Let’s take for instance the most classic context-free language of all times L = { anbn | n >= 0 }.

We’ll be interested in the third part of Myhill-Nerode theorem which states, that relation ~L is of finite index. We need to find a way of showing that it’s not and therefore the language is not regular (since Myhill-Nerode theorem is an equivalence).

In this case, the most elegant way is proof by contradiction. We will show, that L = { anbn | n >= 0 } is not reglar.

Proof 1. Suppose that L is a regular language. Let i, j \in N be two natural numbers so i \neq j. Then consider words u, v, w \in \Sigma^* and a sequence of words over \Sigma^* a1, a2, …, ai.

Now let’s assign some values to strings u, v and w:

u = a^i, v=a^j, c=b^i.

According to the definition of prefix equivalence (definition 2),

a^ib^i \in L \Leftrightarrow a^jb^i \in L

we can see, that none of the words ai from the sequence are ~L-equivalent. The sequence is infinite, so the index of the relation will be infinite. But that’s a contradiction with the Myhill-Nerode theorem. Therefore, the language is not regular.

Sources