docs

How it works

Overview

kemlang-py is a tree-walking interpreter. This section explains exactly how it turns a .jsk source file into running output - from character scanning all the way to executing statements.

What is a programming language, really?

A programming language is a convention. The source file you write is just text - a sequence of Unicode characters sitting on disk. Nothing in the hardware understands bhai bol. The interpreter is the program that reads that text and figures out what to do with it.

Every interpreter or compiler does the same fundamental job: transform source text into behavior. The strategies differ enormously in complexity and performance, but the goal is always the same.

The spectrum of language implementations

Different languages take different approaches to turning source into execution.

language implementation spectrum

  Source text
      │
      ▼
  ┌───────────────────────────────────────────────────────────────────┐
  │  COMPILED  (C, Rust, Go)                                          │
  │                                                                   │
  │  Source ──▶ Compiler ──▶ Machine code (.exe) ──▶ CPU runs        │
  │                                                                   │
  │  + Fastest possible execution (direct CPU instructions)           │
  │  - Compilation is a separate step before running                  │
  └───────────────────────────────────────────────────────────────────┘
      │
      ▼
  ┌───────────────────────────────────────────────────────────────────┐
  │  BYTECODE VM  (Python, Java, Lua)                                 │
  │                                                                   │
  │  Source ──▶ Compiler ──▶ Bytecode ──▶ VM interprets              │
  │                                                                   │
  │  + Faster than tree-walking; portable across platforms            │
  │  - VM adds complexity; bytecode is an intermediate layer          │
  └───────────────────────────────────────────────────────────────────┘
      │
      ▼
  ┌───────────────────────────────────────────────────────────────────┐
  │  TREE-WALKING  (kemlang-py, early Ruby, many scripting languages) │
  │                                                                   │
  │  Source ──▶ Lexer ──▶ Parser ──▶ AST ──▶ walk & execute          │
  │                                                                   │
  │  + Simplest implementation; easy to debug and extend              │
  │  - Slowest; each node is re-evaluated on every visit              │
  └───────────────────────────────────────────────────────────────────┘

The pipeline

Every time you run kem run hello.jsk, the source file travels through three sequential stages. Each stage receives the output of the previous one.

the full pipeline

  ┌──────────────────────────────────────────────────────────────────┐
  │  Source file  (hello.jsk)                                        │
  │                                                                  │
  │  kem bhai                                                        │
  │    bhai bol "kem cho, duniya!"                                   │
  │  aavjo bhai                                                      │
  └────────────────────────────┬─────────────────────────────────────┘
                               │  raw UTF-8 text
                               ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │  Stage 1: Lexer  (kemlang/lexer.py)                              │
  │                                                                  │
  │  Scans characters left-to-right. Groups them into tokens.        │
  │  Handles multi-word Gujarati keywords. Skips whitespace.         │
  └────────────────────────────┬─────────────────────────────────────┘
                               │  List[Token]
                               │
                               │  KEM_BHAI    'kem bhai'            1:0
                               │  BHAI_BOL    'bhai bol'            2:2
                               │  STRING      '"kem cho, duniya!"'  2:10
                               │  AAVJO_BHAI  'aavjo bhai'         3:0
                               │  EOF         ''                    4:0
                               ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │  Stage 2: Parser  (kemlang/parser.py)                            │
  │                                                                  │
  │  Consumes tokens one at a time. Checks grammar rules.            │
  │  Builds a tree of dataclass nodes (the AST).                     │
  └────────────────────────────┬─────────────────────────────────────┘
                               │  Program (AST)
                               │
                               │  Program
                               │  └── Print
                               │      └── Literal("kem cho, duniya!")
                               ▼
  ┌──────────────────────────────────────────────────────────────────┐
  │  Stage 3: Interpreter  (kemlang/interpreter.py)                  │
  │                                                                  │
  │  Walks the AST recursively. Executes each node. Manages          │
  │  variable scope via Environment. Handles I/O and errors.         │
  └────────────────────────────┬─────────────────────────────────────┘
                               │
                               ▼
                       stdout: kem cho, duniya!
                       exit code: 0

What the CLI actually does

kemlang/cli.py - kem run (simplified)
source    = Path(file).read_text(encoding="utf-8")
tokens    = Lexer(source).tokenize()          # str  -> List[Token]
ast       = Parser(tokens).parse()            # tokens -> Program
exit_code = Interpreter().interpret(ast)      # AST -> stdout + int
raise typer.Exit(exit_code)

Stage 1: Lexer

The lexer reads source text one character at a time and groups characters into tokens - the smallest meaningful units of the language. kemlang-py's lexer handles multi-word Gujarati keywords like bhai bol by checking multi-word sequences before single-word keywords.

Deep dive: The Lexer

Stage 2: Parser

The parser takes the flat token stream and builds an Abstract Syntax Tree using recursive descent. Each grammar rule maps to a method; operator precedence is encoded in the grammar stratification.

Deep dive: The Parser

Stage 3: Interpreter

The interpreter walks the AST recursively. Statement nodes produce side effects; expression nodes return a KemValue. Variable scope is managed through a chain of Environment objects.

Deep dive: The Interpreter

Explore each stage