Tagged: string

Basic Computer Science

There is a couple of very basic definitions and axioms in computer science. I consider them to be very important, because everything that comes later is based on them. And if you don’t fully comprehend the basic stuff, it will be very hard to understand anything further. That’s why I decided to write a whole post on these trivial definitions.

Alphabet

Yeah, I’m not kidding. There’s a formal definition for alphabet. And what’s worse: it actually makes sense. I just wonder what would a kinder-gardener say on this :-D. Anyway, here it goes:

Definition 1.1. An alphabet is a finite non-empty set of symbols.

Alphabets are usually denoted by Greek big sigma Σ. A set Σ can be referred to as alphabet when it’s not empty, but also not infinitely big.

  • Σ = {a, b, c} is an alphabet
  • Σ = {0, 1} is an alphabet
  • Σ = {} is NOT an alphabet
  • Σ = all integers; is NOT an alphabet

Content of an alphabet are symbols. A symbol is some indivisible or atomic unit that can have some meaning (not necessarily). For example, if Σ = {righteous, dude} were alphabet, righteous and dude would be considered to be atomic indivisible elements (even though they contain multiple characters). Some folks also choose to omit the finite word from the definition. But that’s for some very advanced stuff and I’m guessing that finite alphabets will be more than sufficient for us.

String

Another fundamental thing is a string. String or a word is a sequence of some symbols (the order is important). Strings usually contain symbols from only one alphabet.  In that case we say string x over \Sigma is a finite sequence of symbols from \Sigma. The formal definition is this:

Definition 1.2. Let \Sigma be an alphabet.

  1. empty string \epsilon is string over \Sigma
  2. if x is a string over \Sigma and a \in \Sigma is a symbol from \Sigma, then xa is a string over \Sigma
  3. y is string over \Sigma only when we can derive it by applying rules 1 and 2.

There are some more basic definitions like string length, concatenation, reversation that are useful and important, but they might be a little intimidating at first, so let’s skip ahead to languages.

Language

What is a formal language? We need to define \Sigma^* first. \Sigma^* is a set of all existing strings over alphabet \Sigma. For example, let \Sigma = \{0, 1\}, then \Sigma^* = \{\epsilon\} \cup \{ all positive binary numbers \}.

Then a formal language L over alphabet \Sigma is a subset of \Sigma^*. Formally said, L \subseteq \Sigma^*. Basically, language is a set of strings. In context of programming languages, a language would be a set of all possible programs in that programming language. You see, that in most practical cases the set will be infinite (as it is in the programming language case). So, it’s not very practical to describe languages by enumerating all possible sentences of the language. One way of describing a language is with things called formal grammars. But we’ll get on to that later.

Sources

Advertisements

Brute-Force String Generation in C

Earlier this week, I posted an article about string generation for brute-force attacks and a couple of example solutions. I emphasized, that the key aspect of brute-force is speed. We want to try as many combinations of input data as possible in the minimum amount of time. And part of this is also efficient algorithm that will generate input combinations. But the examples I posted were written in Python, which is kind of a high level scripting language, not nearly as fast as C.

Hence, here is my implementation of the same in C:

/*
 * Basic string generation for brute-force attacks
 * Copyright (C) 2011 Radek Pazdera
 */

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

/* I chose to use an one way linked list data structure
 * to avoid restrictions on the generated string length.
 * The thing is, the list must be converted to string so
 * it could be used. This conversion have to happen in
 * each cycle and causes unnecessary slowdown.
 * 
 * Faster solution would be to implement the generation
 * directly on some staticaly allocated string with fixed
 * size (20 characters are more than enough).
 */
typedef struct charlist charlist_t;
struct charlist
{
    unsigned char character;
    charlist_t* next;
};

/* Return new initialized charlist_t element.
 *
 * Elements are initialized
 * @return charlist_t
 */
charlist_t* new_charlist_element()
{
    charlist_t* element;

    if ((element = malloc(sizeof(charlist_t))) != 0)
    {
        element->character = 0;
        element->next = NULL;
    }
    else
    {
        perror("malloc() failed.");
    }

    return element;
}

/* Free memory allocated by charlist.
 *
 * @param list Pointer at the first element.
 * @return void
 */
void free_charlist(charlist_t* list)
{
    charlist_t* current = list;
    charlist_t* next;

    while (current != NULL)
    {
        next = current->next;
        free(current);
        current = next;
    }
}

/* Print the charlist_t data structure.
 *
 * Iterates through the whole list and prints all characters
 * in the list including any '\0'.
 * 
 * @param list Input list of characters.
 * @return void
 */
void print_charlist(charlist_t* list)
{
    charlist_t* next = list;
    while (next != NULL)
    {
        printf("%d ", next->character);
        next = next->next;
    }
    printf("\n");
}

/* Get next character sequence.
 *
 * It treats characters as numbers (0-255). Function tries to
 * increment character in the first position. If it fails,
 * new character is added to the back of the list.
 *
 * It's basicaly a number with base = 256.
 *
 * @param list A pointer to charlist_t.
 * @return void
 */
void next(charlist_t* list)
{
    list->character++;
    if (list->character == 0)
    {
        if (list->next == NULL)
        {
            list->next = new_charlist_element();
        }
        else
        {
            next(list->next);
        }
    }
}

int main()
{
    charlist_t* sequence;
    sequence = new_charlist_element();

    while (1)
    {
        next(sequence);
        print_charlist(sequence);
    }

    free_charlist(sequence);
}

Download the fully working code from github.