Tagged: grammars

October 21, 2011

Language in Computer Science

The last post was about strings, so now something about languages. From the theoretical point of view a language is a set of words or more precisely sentences (strings). Possibly and usually an infinite set. Only sentences that are present in the set are part of the language, nothing else. Neat feature of languages is, that they interconnect two separate formal theories/models into one thing. Let me explain.

Language is a set. As in mathematical set. So anything that applies to sets applies to languages as well (like union, intersection, complement, Cartesian product). Members of any language are only strings. And there are some operations defined over strings (that is concatenation, reversal, sub-strings etc.). We can take advantage of this fact and abstract those, so they work with language as well. There is generally a lot you can do with a formal language either from the set or string point of view.

Definition 1. A formal language L over an alphabet Σ is a subset of Σ^*. Formally, L ⊆ Σ^*. The magical Σ^* denotes a set of all existing words and sentences over Σ. Sometimes it’s called an alphabet iteration.

Like I said earlier, all operations defined for sets work with languages as well. For example language union L₁ ∪ L₂ = {s; s ∈ L₁ ∨ s ∈ L₂} etc. More interesting are the string operations that can be abstracted so they work on languages. There is language concatenation:

Definition 2. Let L₁ be a language over Σ₁ and L₂be a language over Σ₂. Concatenation of these two L = L₁⋅L2 = {xy; x ∈ L₁, y ∈ L₂} is a language over alphabet Σ = Σ₁ ∪ Σ₂.

Language concatenation as well as the string equivalent is not commutative operation. There is a couple of special cases:

L ⋅ {ε} = L
L ⋅ ∅ = ∅

The first, {ε} is a language with a single word — an empty string, in the second case we’re talking about an empty set. Concatenation can be looked at a sort of multiplication between the two languages (remember Cartesian product?). As a result of this thought, we define language power.

Definition 3. Let L be a language over Σ, then Lⁿ denotes n-power of language and

L⁰= {ε}
Lⁿ⁺¹=Lⁿ⋅L

With language power defined, we can advance to the last operation called language iteration. It’s basically an abstraction of language power.

Definition 4. L⁺ denotes positive iteration of language L defined accordingly

L⁺ = L ∪ L²∪ … ∪ L^∞

Definition 5. L^* denotes iteration of language L defined accordingly

L^* = {ε} ∪ L ∪ L²∪ … ∪ L^∞

Summary

In this post were introduced some of the most basic and most common operations that are defined on languages. The most important thing to remember is, that language is a set of sentences (or words) and the operations that we define originate from these two areas.

We have some set operations like union, intersection, complement as for generic sets. And we have abstractions for string operations like concatenation (notice, that it’s somewhat similar to cartesian product), language power and iteration.

Anyway, these couple of operations is nowhere near to the complete list of all existing functions for languages! After all, if you dive into some algebraic structures a little you can easily define your own :-).

August 31, 2011

Basic Computer Science

There is a couple of very basic definitions and axioms in computer science. I consider them to be very important, because everything that comes later is based on them. And if you don’t fully comprehend the basic stuff, it will be very hard to understand anything further. That’s why I decided to write a whole post on these trivial definitions.

Alphabet

Yeah, I’m not kidding. There’s a formal definition for alphabet. And what’s worse: it actually makes sense. I just wonder what would a kinder-gardener say on this :-D. Anyway, here it goes:

Definition 1.1. An alphabet is a finite non-empty set of symbols.

Alphabets are usually denoted by Greek big sigma Σ. A set Σ can be referred to as alphabet when it’s not empty, but also not infinitely big.

Σ = {a, b, c} is an alphabet
Σ = {0, 1} is an alphabet
Σ = {} is NOT an alphabet
Σ = all integers; is NOT an alphabet

Content of an alphabet are symbols. A symbol is some indivisible or atomic unit that can have some meaning (not necessarily). For example, if Σ = {righteous, dude} were alphabet, righteous and dude would be considered to be atomic indivisible elements (even though they contain multiple characters). Some folks also choose to omit the finite word from the definition. But that’s for some very advanced stuff and I’m guessing that finite alphabets will be more than sufficient for us.

String

Another fundamental thing is a string. String or a word is a sequence of some symbols (the order is important). Strings usually contain symbols from only one alphabet. In that case we say string $x$ over $\Sigma$ is a finite sequence of symbols from $\Sigma$ . The formal definition is this:

Definition 1.2. Let $\Sigma$ be an alphabet.

empty string $\epsilon$ is string over $\Sigma$
if $x$ is a string over $\Sigma$ and $a \in \Sigma$ is a symbol from $\Sigma$ , then $xa$ is a string over $\Sigma$
y is string over $\Sigma$ only when we can derive it by applying rules 1 and 2.

There are some more basic definitions like string length, concatenation, reversation that are useful and important, but they might be a little intimidating at first, so let’s skip ahead to languages.

Language

What is a formal language? We need to define $\Sigma^*$ first. $\Sigma^*$ is a set of all existing strings over alphabet $\Sigma$ . For example, let $\Sigma = \{0, 1\}$ , then $\Sigma^* = \{\epsilon\} \cup \{$ all positive binary numbers $\}$ .

Then a formal language L over alphabet $\Sigma$ is a subset of $\Sigma^*$ . Formally said, $L \subseteq \Sigma^*$ . Basically, language is a set of strings. In context of programming languages, a language would be a set of all possible programs in that programming language. You see, that in most practical cases the set will be infinite (as it is in the programming language case). So, it’s not very practical to describe languages by enumerating all possible sentences of the language. One way of describing a language is with things called formal grammars. But we’ll get on to that later.

Sources

http://www.cs.umbc.edu/portal/help/theory/lang_def.shtml