**Introduction**

Last class we discussed normal forms, which was wonderfully recapped in this Wednesday’s blog post by Akash. Today we will pretend that we are computer scientists and focus on languages and automata. Although it may seem a bit random at first glance, the motivations we have tie back into group theory and give us many useful new tools.

**Motivations and Definitions**

Several key terms in relation to today’s topic were introduced las class such as the *alphabet* : which is a set of words acted on by a monoid. A monoid is a group without any inverses.

We will focus on free monoid which implies that the elements within the monoid follow the structure of a free group. We can go further and motivate the need for *languages*.

Consider the monoid and let be the free monoid on which contains all of the words coming from .

**Definition** **1** A language on is a subset of .

From here we can go on to introduce today’s key topic: automata.

**Definition** **2**

An *automata* on is a directed graph with decorations.

1. Some vertices are labelled with a **start** state property.

2. Some vertices are labelled with a **accept** state property.

3.Some edges are labelled by elements of .

4. Some vertices and edges are not labelled at all.

One may ask, well… how do we label these vertices? It’s easy! We must simply draw an on each start vertex, a circle on each accept vertex, the element name on each labelled edge, and nothing at all in the case of the vertex or edge being unlabeled. We can also relate automata to languages since a possible way to create a language is to use an automaton. We can create a language from an automaton by observing all paths from a start state to an accept state, and reading off the edge labels.

Below is a simple example from class of an automata on . Notice, here the start state is also an accept state, which is fine since nowhere did we define them to be mutually exclusive.

As we can see this looks pretty much like a directed graph with a few extra quirks… which we will get to in a second. We first connect it to languages. Given an automata there exists a language which is a graph of all paths from the start state to an accept state. For instance, in the example above we can find a finite set of elements which describe all possible paths from the start state to accept states of the automata. In the example above the language can be written as the set , where and is due to the loop edge connected to the top accept vertex. One of the most important things to notice from this example is the finiteness of . This will not always happen, but when it does it will be of utmost use to us. In fact, if is finite then we call it a finite state automata (FSA). The rest of our study for the course will be focused on finite state automata.

Before we go on further into more connected definitions, we pose a challenge question for the motivated reader.**Puzzle:** Find an FSA, , such that on the alphabet . How many can you find? Are there infinitely many possible options for alphabet to produce this automaton? How does this relate to Cayley graphs of some groups which we have seen in weeks prior?

We now move on to yet another addition to our vocabulary (no pun intended…).

**Definition** **3**: An FSA is called *deterministic* if there exists a unique start state and no two edges leaving a vertex share the same label. Moreover, every edge must have a label in the DFA. A deterministic FSA is called DFA, for short.

We call a DFA *complete* if for every vertex in the automaton and each letter in the alphabet there are edges labelled by all letters leaving the vertex.

**Claim:** If is a DFA, then there exists which is a completed DFA such that . We will not prove this for time’s sake but this could also be an additional puzzle that you can examine when your next class gets boring.

We have now covered the two main definitions of this blog post’s content. We now may ask the natural question … so what? We have FSAs and DFAs and we know how they are connected to monoids and alphabets and thus letters and words which we used prior. And yes, there seems to be a vague familiarity of the structure of automata and Cayley graphs of groups, however, we want a connection between all of it! Unfortunately we may not get to all of that today, yet, we will take the first step in connecting DFAs and FSAs more robustly in order to determine what languages can actually be generated or accepted by these new machines. The motivation is, roughly, the same as it has been for the entirety of the class. How can we reproduce something as a finite set of vertices and connecting edges where the path is structured by the specific decorations which we have described. Will the additional structure of automata help our quest of understanding infinite groups?

The key theorem, tied to this, that we will prove is as follows.**Theorem 1: **The set of languages accepted by FSA are equal to the set of languages accepted by a complete DFA.

Wow! That’s really cool. I see this as a very useful path since an FSA seems like a very simple thing to construct, in general; and if we could then show that this same language can be accepted by a complete DFA, which has a more similar structure to some of the complicated groups we have seen, that would be really useful.

To motivate further we provide an example of a complete FSA on from which we will begin the proof. Notice we have also labelled each vertex with a number which we will put to good use soon.

**Proof of Theorem 1 — sketch (motivated by example):**

Let’s call the FSA above . We see that this is a rather ‘flawed’ example since there are both unlabeled edges and the directions of the edges do not satisfy the conditions for a DFA. We will transform to which will eliminate unlabeled edges and subsequently transform to which will be our DFA. Our first task is to transform to have no unlabeled edges.

**Step 1: Eliminating unlabeled edges**

We first will want to examine the vertex set inside of the FSA which we define by and consider the subset . Also define as the set of vertices in you can reach from a certain vertex. For instance, in our example we would have . Similarly, , where we now examine the vertices we can get to from both vertex 1 and vertex 2. We can continue further and the most notable thing that will happen is we will find three vertex sets which can map to the exact set of themselves and nothing else. Namely, , , and .

Once we have these three special vertices we can define along with the set . In essence this gives us that the states (or vertices) of will be precisely these three special vertices. Instead of the term special we will use ‘accept’ vertices and formally define as accept if it contains an accept vertex of . Well we have the vertices… what about the edges?

We define another set with the condition that you can reach in a single -arrow from a vertex in . It follows that our edges will be defined by where and . This generates the following picture.

**Step 2: Obtaining the DFA**

At this point, we are tempted to be finished, however, we do not yet have our complete DFA. This picture shows an automaton with the start state having an edge of each element in (the entire alphabet) and each vertex is connected by a labelled edge and is an accept vertex. Moreover, we define the state connected to edges of all elements as the unique start state. However, looking back on the definition of a complete DFA we are only part of the way there. We must have every vertex in our graph to have an edge leaving it, labelled by each element of .

We thus must extend our process by taking as an FSA, which it certainly is, and will map it to which will be an automaton adhering to all necessities of a DFA. We take . We will draw an edge with labels from of the form for and under the condition that for all vertices there exists such that is an edge labelled by and for all there exits a where is also labelled. This creates the surjective mapping and completes our claim that a language on FSA is accepted by a DFA as well. This is illustrated below by the two vertex sets mapping to each other surjectively which is what we manifest in step 2.

Our theorem, however, is to show that the DFA must be complete. This can be simply demonstrated by finding that any FSA can be transformed into a complete FSA by adding a dead vertex to the automaton. This process is demonstrated below by an automaton on where the dead vertex is labelled on the right of the figure. We notice that in order to keep the process finite and avoid adding dead states forever, we draw edges that loop back to the dead vertex and thus conclude the process of transitioning the FSA to a complete FSA.

We concluded class with a definition about *regular* languages.**Definition 4:**

The language is a *regular* language if there exists an FSA whose corresponding language is .

This will lead to a theorem about languages which are accepted by *non-deterministic* finite state automata corresponding to the languages accepted by the set of regular languages. Furthermore, this begs the question of whether all languages are regular. We can also ponder how this will all connect to normal forms and last class’ content.

I think it was cool to learn about these machines as it provided me with some more language to better explain the way I think about Cayley graphs! To me, they are super similar and as we saw in class, there are very real links between these two ideas.

Great post Osip! I like your motivating example for the proof that any regular language is accepted by a DFA.

Another potentially interesting thing – regular languages can equivalently be defined by these things called _regular expressions_. A regular expression is a string of characters from an alphabet, where you’re also allowed to add the characters (, ), *, and |. The * is the Kleene star, which indicates you can repeat something arbitrarily many times. | is used to indicate “or.” You can use regular expressions to define languages. For instance, b(a|b)* specifies the language {b, ba, bb, baa, bab, bba, bbb, baaa, …}, or all strings over a and b that start with a b. A language is regular iff it is recognized by a regular expression.

Great post, Osip! Just a quick clarification — does the “finite” in finite state automata refer to the finiteness of the language of the finiteness of the graph? My understanding was that it applied to the graph, but in your definition, it seemed to apply to the language. However, the language that you use as an example is not in fact finite — we can represent it in a finite list, but since accepts any in the natural numbers, there are an infinite number of elements in the language. Any clarification would be much appreciated!

Very nice illustration on step two of the proof! I love your step-by-step explanation of the algorithm. It makes sense how we start with an FSA. Then, first, eliminate unlabeled edges by using the “closure” of vertices to boil down the words the given FSA can produce. Then, to complete the resulting FSA to a DFA.