Hai's home

Interactive Notes on Finite-State Machine

· 3 min read


A word on language

Mirroring the daily meaning of the word, in formal language theory, an alphabet Σ\Sigma is a finite1 set of characters. A string over the alphabet Σ\Sigma is a tuple x=(a1,a2,,ak)x=(a_1,a_2,\ldots,a_k) in Σk\Sigma^k in which we write simply as x=a1a2akx=a_1a_2\cdots a_k. A string xx in Σk\Sigma^k has length kk, denoted x=k|x|=k. The set of all strings of all lengths over the alphabet Σ\Sigma is the Kleene closure of Σ\Sigma and denoted Σ\Sigma^*; symbolically,

Σ=k=0Σk.\Sigma^* = \bigcup_{k=0}^\infty \Sigma^k.

We define Σ0={ε}\Sigma^0=\{\varepsilon\}, where ε\varepsilon is the empty string, the only string having zero length. A language over Σ\Sigma is a subset LL of Σ\Sigma^*. As an example of a frequently-seen language, the set of all binary strings is {0,1}={ε,0,1,00,01,}\{0,1\}^* = \{\varepsilon, 0, 1, 00, 01, \ldots\}.

Given strings x=a1a2akx=a_1a_2\cdots a_k and y=b1b2bly=b_1b_2\cdots b_l over Σ\Sigma, the concatenation of xx and yy, denoted xyx\cdot y or just xyxy, is the string produced by appending yy to xx:

xy=xy=a1a2akb1b2bl.xy = x\cdot y = a_1a_2\cdots a_kb_1b_2\cdots b_l.

Note that εx=xε=x\varepsilon x = x\varepsilon = x for any string xx.

type Letter = string | symbol;
type Str<L extends Letter> = Iterable<L>;

Finite-state machine

A finite-state machine (FSM)2 MM is specified through five elements, as follows:

M=(Q,Σ,δ,q0,F)M = (Q, \Sigma, \delta, q_0, F)

where

Mechanistically, an FSM MM can be thought of as a machine that repeatedly receives a character from an alphabet Σ\Sigma and changes its internal status accordingly. Though the above specification does not mention output, we can regard MM as being able to output YES or NO, depending on whether its state is one of the final states.

Naturally, we can extend the transition function to accept an arbitrary string over Σ\Sigma in an inductive manner: for xΣx\in\Sigma^* and aΣa\in\Sigma, we define

δ(q,xa)=δ(δ(q,x),a).\delta^*(q, xa) = \delta(\delta(q, x), a).

The function δ\delta^* can receive as input any (nonempty) string over Σ\Sigma. We also let δ(q,ε)=q\delta^*(q, \varepsilon) = q—by reading no input, the machine does not change its state. We overload δ\delta to mean δ\delta^* and assume the transition function δ\delta can take arbitrary strings as input. Note that δ ⁣:Q×ΣQ\delta\colon Q\times\Sigma^*\to Q.

The FSM MM is said to accept a string xx if δ(q0,x)F\delta(q_0, x) \in F. In other words, the machine “output” YES when it accepts the input string and NO otherwise. The set of strings that MM accepts is the language accepted or recognized by MM.

type State = string | symbol;
class FSM<S extends State, L extends Letter> {
  state: S;
  initialState: S;
  transition: (input: Str<L>) => void;
  finalStates: Set<S>;

  constructor(
    initialState: S,
    finalStates: Set<S>,
    t: (s: S, l: L) => S
  ) {
    this.state = initialState;
    this.initialState = initialState;
    this.finalStates = finalStates;
    this.transition = (input: Str<L>) => {
      for (const l of input) {
        this.state = t(this.state, l);
      }
    }
  }
}

Footnotes

  1. Technically, a language can be defined on an arbitrary alphabet, but for most practical purposes, finite alphabet works without problems.

  2. There are many names, such as finite automaton, for the concept with various meanings. Here, “finite-state machine” follows exactly the given definition.