Prolog Learnings: Recursion, Unification and Backtracking
Unification makes Prolog the Tenet of programming languages.
list_length/2
counts the number of elements in a list, where the first
argument is a list, and the second argument is the number of elements in the
list.
It is good Prolog practice to try and encode the intent, order and type of arguments in the functor name. So "list_length" defines what the predicate does and that the first argument should be a list and the second the length of the list. 
Using a single code path, it should be possible to;

Return the number of elements in a list, using the magic of recursion.

Validate that the number of elements in the list equals a specified
C
, using the magic of Unification. 
Return a new list containing the number of elements specified in
C
, using the magic of backtracking. 
If a list and count are unspecified, then return values for list and count where
list_length
evaluates totrue
, using all of the magic fairy dust.
This is called multiway reasoning^{[1]} and allows the same predicate to be used in many ways.
1
2
3
4
5
6
7
8
9
%% list_length/2 counts the elements in a list.
%% The base case terminating recursion.
list_length([], 0).
%% Recurse through the list, counting each element.
list_length([_Rest], C) :
list_length(Rest, Count),
C is Count + 1.
Some explanation of the code above is probably needed.

The base case must always come before the recursive call because Prolog (and Elixir) evaluate procedures in order of appearance in the source file. So on each call to
list_length/2
Prolog will first attempt to pattern match to the base case, and if this fails, call the recusive case. 
The notation
[FirstRest]
pattern matches on a list, assigning the first element in a list toFirst
and the remaining list elements toRest
. Here, "_" is effectively used to throw away the first item of the list. If you are used to Elixir then the Prolog list notation and the use of "\_" to ignore a parameter will be very familiar. 
There is a difference between
=
andis
in Prolog.
is
assigns the result of a calculation, 
=
is used for Unification. As we will see later on in length/2 implemented using "=",list_length/2
can be implemented using=
instead ofis
.


Due to the immutability of variables, Prolog does not support a common notation such as
C is C + 1
, orcount++
, which is why the intermediateCount
variable is needed. Because once a variable is assigned a value, that value must remain unchanged throughout the variable scope. So in the case ofC is C + 1
, Prolog cannot assign theC
on the left of theis
a different value to whateverC
is already assigned on the right side of theis
. This does take some getting used to.
list_length/2 Use Cases
We can see all our defined use cases in just the base case.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
%% The base case terminating recursion.
list_length([], 0).
? list_length([], C). % (1)
C = 0
? list_length([], 0). % (2)
true.
? list_length(L, 0). % (3)
L = [].
? list_length(L, C). % (4)
L = [],
C = 0.
1  An empty list [] matches the first argument of the functor definition, so the second argument C is returned with a value of 0 . 
2  An empty list [] and a list count 0 matches the first and second arguments, so the predicate evaulates to true . Any other values and the predicate will evaluate to false . 
3  A list count 0 matches the second argument, so an empty list [] is returned. 
4  Prolog can only evaluate the base case to true with an empty list [] and a list count of 0 . Anything else will evaluate to false . 
Now that the base case is defined to handle all possbile use cases, we can define the main recursive procedure.
1
2
3
4
%% Recurse through the list, counting each element.
list_length([_Rest], C) :
list_length(Rest, Count),
C is Count + 1.
Use Case 1: Return the number of elements in a list
For the first use case of returning the number of elements in a list,
list_length
works as follows;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
? list_length([1,2,3], C).  list_length([1,2,3], C). <> C = 3.

% Stack frame #1 
list_length([_Rest], C) :  list_length(_[2,3], _4012) <> list_length(_[2,3], 3)
list_length(Rest, Count),  list_length([2,3], _4486), list_length([2,3], 2),
C is Count + 1.  _4012 is _4486 + 1. 3 is 2 + 1.

% Stack frame #2 
list_length([_Rest], C) :  list_length(_[3], _4804) <> list_length(_[3], 2)
list_length(Rest, Count),  list_length([3], _4530), list_length([3], 1),
C is Count + 1.  _4804 is _4530 + 1. 2 is 1 + 1.

% Stack frame #3 
list_length([_Rest], C) :  list_length(_[], _4666) <> list_length(_[], 1)
list_length(Rest, Count),  list_length([], _4574), list_length([], 0),
C is Count + 1.  _4666 is _4574 + 1. 1 is 0 + 1.

% Stack frame #4 
list_length([], 0).  list_length([], 0). <> list_length([], 0).

Throw away the first element of the list, then

Call
list_length
with the remaining list, passing in a new variableCount
. 
Return to (1) until the list is empty, matching
[]
to the base case. 
Return
0
from the base case, because this is the number of elements in[]
, the empty list. 
Returning from
list_count
,Count
will have a value0
, 
Add
1
toCount
, and assign this toC
. 
Return the value of
C
, 
Returning from
list_count
,Count
now has the value ofC
, 
Add
1
toCount
, and assign this toC
. 
Return to (7), and repeat until we have climbed back up the stack.

C
will now have the total number of elements of the list.
Use Case 2: Validate that elements in a list equal specified count
The second use case validates that the number of elements in the list matches the specified count. The important takeaway here is that Unification allows the same code and code path to handle both use cases 1 and 2.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
? list_length([1,2,3], 3).  list_length([1,2,3], 3). <> true.

% Stack frame #1 
list_length([_Rest], C) :  list_length(_[2,3], 3) <> list_length(_[2,3], 3) % (1)
list_length(Rest, Count),  list_length([2,3], _4486), list_length([2,3], 2),
C is Count + 1.  3 is _4486 + 1. 3 is 2 + 1. % (2)

% Stack frame #2 
list_length([_Rest], C) :  list_length(_[3], _4804) <> list_length(_[3], 2)
list_length(Rest, Count),  list_length([3], _4530), list_length([3], 1),
C is Count + 1.  _4804 is _4530 + 1. 2 is 1 + 1.

% Stack frame #3 
list_length([_Rest], C) :  list_length(_[], _4666) <> list_length(_[], 1)
list_length(Rest, Count),  list_length([], _4574), list_length([], 0),
C is Count + 1.  _4666 is _4574 + 1. 1 is 0 + 1.

% Stack frame #4 
list_length([], 0).  list_length([], 0). <> list_length([], 0).
1  3 is passed as the second argument so the anonymous variable _4012 is not created. 
2  Prolog unifies the clause 3 is 2 + 1 to true . 
Prolog uses unification in <2> above (instead of simple assignment) to attempt
to make the left side of the is
the same as the right side. In use case 1,
prolog unifies _4012 is 2 + 1
by assigning 3
to _4012
. In use case 2,
Prolog attempts to unify 3 is 2 + 1
, which is valid Prolog syntax. In this
case 3
is the same as 2 + 1
and the clause evaluates to true
. If 5
was
passed in as the second argument, then Prolog would attempt to unify 5 is 2+1
,
which it cannot and the clause would evaluate to false
.
Use Case 3: Return a new list with the number of elements in count  TODO
The third use case returns a new list wth the number of elements specified in count. The important takeaway here is that Backtracking allows the same code and code path can handle use cases 1, 2 and 3.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
? list_length(L, 3).  list_length(L, 3). <> L = [_2352, _3425, _4523].

% Stack frame #1 
list_length([_Rest], C) :  list_length(_[2,3], _4012) <> list_length(_[2,3], 3)
list_length(Rest, Count),  list_length([2,3], _4486), list_length([2,3], 2),
C is Count + 1.  _4012 is _4486 + 1. 3 is 2 + 1.

% Stack frame #2 
list_length([_Rest], C) :  list_length(_[3], _4804) <> list_length(_[3], 2)
list_length(Rest, Count),  list_length([3], _4530), list_length([3], 1),
C is Count + 1.  _4804 is _4530 + 1. 2 is 1 + 1.

% Stack frame #3 
list_length([_Rest], C) :  list_length(_[], _4666) <> list_length(_[], 1)
list_length(Rest, Count),  list_length([], _4574), list_length([], 0),
C is Count + 1.  _4666 is _4574 + 1. 1 is 0 + 1.

% Stack frame #4 
list_length([], 0).  list_length([], 0). <> list_length([], 0).
length/2 implemented using "="
Ivan Bratco in his
book^{[2]}
presents an alternate solution to length/2
using =
instead of is
.
Lets start by only replacing is
with =
.
1
2
3
4
list_length_1([], 0).
list_length_1([_Rest], C) :
list_length_1(Rest, Count)
C = Count + 1.
Which when evaluated returns…
1
2
? list_length_1([1,2,3], C).
C = 0+1+1+1,
OK, what’s going on here? Remember that =
does not evaluate any of its terms,
so each recursive call ads another +1
to C. So 0
, then 0+1
, then 0+1+1
,
and finally 0+1+1+1
. To return the length as an integer, we need to evaluate
the result of C
using is
, like so….
1
2
? list_length_1([1,2,3], C), Length is C.
Length = 3.
Remember that unlike is
where all arguments must already have been
instantiated to numbers, =
does not evaluate its terms. So list_length_1/2
can be rewritten with C = Count + 1,
appearing before
list_length_1(Rest,Count).
.
1
2
3
list_length_1([_Rest], C) :
C = Count + 1,
list_length_1(Rest, Count).
Yet another Prolog idiosyncrasy — postponed variable bindings; when a
variable is assigned a value all instances of that variable, within scope,
assume that value, even those instances that occur in the code prior to the
assignment. So Count
is unassigned in the clause C = Count + 1
. But because
=
does not evaluate its arguments, Count
acts as a placeholder and binding
is postponed until such time as a value is assigned. Prolog is like the
Tenet^{[3]} of
programming languages.
As =
is implicit in predicate arguments (remember that Prolog does not
evaluate arguments), list_length_1/2
can be refactored to remove the explicit
=
as follows.
1
2
list_length_1([_Rest], 1 + L) :
list_length_1(Rest, L).
Which results in the correct answer.
1
2
? list_length_1([1,2,3], C), Length is C.
Length = 3.
Using is
or =
in list_length_1/2
is, I suppose, a matter of taste.
Although I personally think using is
makes the intent explicit in the code.
From a "Big O" point of view, using =
results in the creation of a new list
which equates to O(n)
in additional memory needed.