Programming Wilderness

Wednesday, January 25, 2012

Pattern Matching and Guards are Fundamentally Different

Pattern Matching and Guards are fundamentally quite different! At least in Haskell, at any rate.

Function using pattern matching

check :: [a] -> String
check [] = "Empty"
check (x:xs) = "Contains Elements"
Function using guards

check_ :: [a] -> String
check_ lst
| length lst < 1 = "Empty"
| otherwise = "Contains elements"

Guards are both simpler and more flexible: They're essentially just special syntax that translates to a series of if/then expressions. You can put arbitrary boolean expressions in the guards, but they don't do anything you couldn't do with a regular if.

Pattern matches do several additional things: They're the only way to deconstruct data, and they bind identifiers within their scope. In the same sense that guards are equivalent to if expressions, pattern matching is equivalent to case expressions. Declarations (either at the top level, or in something like a let expression) are also a form of pattern match, with "normal" definitions being matches with the trivial pattern, a single identifier.

Pattern matches also tend to be the main way stuff actually happens in Haskell--attempting to deconstruct data in a pattern is one of the few things that forces evaluation.

By the way, you can actually do pattern matching in top-level declarations:

square = (^2)

(one:four:nine:_) = map square [1..]

This is occasionally useful for a group of related definitions.

GHC also provides the ViewPatterns extension which sort of combines both; you can use arbitrary functions in a binding context and then pattern match on the result. This is still just syntactic sugar for the usual stuff, of course.

As for the day-to-day issue of which to use where, here's some rough guides:

Definitely use pattern matching for anything that can be matched directly one or two constructors deep, where you don't really care about the compound data as a whole, but do care about most of the structure. The @ syntax lets you bind the overall structure to a variable while also pattern matching on it, but doing too much of that in one pattern can get ugly and unreadable quickly.

Definitely use guards when you need to make a choice based on some property that doesn't correspond neatly to a pattern, e.g. comparing two Int values to see which is larger.

If you need only a couple pieces of data from deep inside a large structure, particularly if you also need to use the structure as a whole, guards and accessor functions are usually more readable than some monstrous pattern full of @ and _.

If you need to do the same thing for values represented by different patterns, but with a convenient predicate to classify them, using a single generic pattern with a guard is usually more readable. Note that if a set of guards is non-exhaustive, anything that fails all the guards will drop down to the next pattern (if any). So you can combine a general pattern with some filter to catch exceptional cases, then do pattern matching on everything else to get details you care about.

Definitely don't use guards for things that could be trivially checked with a pattern. Checking for empty lists is the classic example, use a pattern match for that.

In general, when in doubt, just stick with pattern matching by default, it's usually nicer. If a pattern starts getting really ugly or convoluted, then stop to consider how else you could write it. Besides using guards, other options include extracting subexpressions as separate functions or putting case expressions inside the function body in order to push some of the pattern matching down onto them and out of the main definition.

Wednesday, December 28, 2011

The mysterious case of $ and $! in Haskell

Has anybody noticed the difference in Haskell between the operators ($) and ($!)?

($!) is strict function application. That is, it evaluates the argument before evaluating the function.

This is contrary to normal lazy function application in Haskell, e.g. f x or f $ x, which first start to evaluate the function f, and only compute the argument x if it is needed.

For example succ (1 + 2) will delay the addition 1 + 2 by creating a thunk, and start to evaluate succ first. Only if the argument to succ is needed, will 1 + 2 be evaluated.

However, if you know for sure that the argument to a function will always be needed, you can use ($!), which will first evaluate the argument to weak head normal form, and then enter the function. This way, you don't create a whole big pile of thunks and this can be more efficient. In this example, succ $! 1 + 2 would first compute 3 and then enter the function succ.

Note that it is not always safe to just replace normal function application with strict function application. For example:

ghci> const 1 (error "noo!")
1
ghci> const 1 $! (error "noo!")
*** Exception: noo!

Wednesday, November 16, 2011

Designing a Programming Language Can Be Fun

I recently stumbled upon a nice discussion started by Kanchi in the stacexchange forums. The discussion was if one were to design a programming language, how would one do it?

Such questions are too vague. Language features can't really be discussed until the purpose of the language is determined. Language design is a huge topic. If you're interested in designing a language, a good place to start is by thinking about what the deficiencies are in a language that you already know. Design decisions often arise from considering a design defect in another product.

Alternatively, consider a domain that you are interested in, and then design a domain-specific language (DSL) that specifies solutions to problems in that domain.

Once you have sketched out what you want your language to look like, try to write down precisely what the rules are for determining what is a legal and illegal program. Typically you'll want to do this at multiple levels:

A reason for creating a new language
A Philosophy
A Semantic Definition
A lexical description of your tokens
A Syntax Analysis definition

How will your language be different? What is its mission? Is it functional? Is it object orientated? Is it a meta-language? What are its unique features? What will it give the world that doesn't exist (or exists in an ugly way)? How do you want to change things? Is it compiled or interpreted? A DSL or general purpose language? This is your philosophy and dictates alot about your language's design.

Next, work on scratching out rough syntax and semantics on paper. This will be your semantic definition ... writing fake code is a great way to develop your thoughts.

You will then need to define your tokens and syntax in some way. Programs then process these into automata capable of reading in strings and processing the syntax. Yacc and Bison use Regular Expressions and a BNF style syntax for lexical and syntax analysis respectively. There are also Yacc and Bison like tools in for other languages.

You will also need a grounding in language theory/compilers to know what NOT to do. Examples include ambiguous grammars, AST generation and manipulation problems and generally how to make life simple for yourself. Knowing the theory is very important.

This is how my dream programming language would look like:

A powerful static type system with some support for dependent typing.
Optional dynamic typing.
Numeric Tower a la Lisp but statically typed.
Macros a la Lisp.
Primarily a Functional Programming language with basic support for imperative programming (like ML family).
Garbage collection.
Type inference.
Continuations.
Optional lazy semantics.
All the control constructs would be provided in the form of library functions. (This can be made possible using last two features.)
Minimal syntax (not as little as Lisps, but something of the sort of Ioke/Seph.)

Here are some good books that will help you out

Compilers: Principles, Techniques, and Tools
Modern Compiler Implementation in C

Pages