forked from BartoszMilewski/Publications
-
Notifications
You must be signed in to change notification settings - Fork 0
/
MonoCata.tex
248 lines (221 loc) · 11.1 KB
/
MonoCata.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
\documentclass[11pt]{amsart}
\usepackage[a4paper,verbose]{geometry}
\geometry{top=3cm,bottom=3cm,left=3cm,right=3cm,textheight=595pt}
\setlength{\parskip}{0.3em}
% ==============================
% PACKAGES
% ==============================
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{amsmath}
\usepackage{caption}
\usepackage[inline]{enumitem}
\setlist{itemsep=0em, topsep=0em, parsep=0em}
\setlist[enumerate]{label=(\alph*)}
\usepackage{etoolbox}
\usepackage{stmaryrd}
\usepackage[dvipsnames]{xcolor}
\usepackage[]{hyperref}
\hypersetup{
colorlinks,
linkcolor=blue,
citecolor=blue,
urlcolor=blue}
\usepackage{graphicx}
\graphicspath{{assets/}}
\usepackage{mathtools}
\usepackage{tikz-cd}
\usepackage{minted}
\usepackage{float}
\usetikzlibrary{
matrix,
arrows,
shapes
}
\setcounter{tocdepth}{1} % Sets depth for table of contents.
% ======================================
% MACROS
%
% Add your own macros below here
% ======================================
\newcommand{\rr}{{\mathbb{R}}}
\newcommand{\nn}{{\mathbb{N}}}
\newcommand{\iso}{\cong}
\newcommand{\too}{\longrightarrow}
\newcommand{\tto}{\rightrightarrows}
\newcommand{\To}[1]{\xrightarrow{#1}}
\newcommand{\Too}[1]{\To{\;\;#1\;\;}}
\newcommand{\from}{\leftarrow}
\newcommand{\From}[1]{\xleftarrow{#1}}
\newcommand{\Cat}[1]{\mathbf{#1}}
\newcommand{\cat}[1]{\mathcal{#1}}
\newtheorem*{remark}{Remark}
\renewcommand{\ss}{\subseteq}
\newcommand{\hask}[1]{\mintinline{Haskell}{#1}}
\newenvironment{haskell}
{\VerbatimEnvironment
\begin{minted}[escapeinside=??, mathescape=true,frame=single, framesep=5pt, tabsize=1]{Haskell}}
{\end{minted}}
% ======================================
% FRONT MATTER
% ======================================
\author{Bartosz Milewski}
\title{Monoidal Catamorphisms}
\begin{document}
\maketitle{}
I have recently watched a talk by \href{https://github.com/Gabriel439/slides/blob/master/munihac/foldmap.md}{Gabriel Gonzalez} about folds, which caught my attention because of my interest in both recursion schemes and optics. A \hask{Fold} is an interesting abstraction. It encapsulates the idea of focusing on a \emph{monoidal contents} of some data structure. Let me explain.
Suppose you have a data structure that contains, among other things, a bunch of values from some monoid. You might want to summarize the data by traversing the structure and accumulating the monoidal values in an accumulator. You may, for instance, concatenate strings, or add integers. Because we are dealing with a monoid, which is associative, we could even parallelize the accumulation.
In practice, however, data structures are rarely filled with monoidal values or, if they are, it's not clear which monoid to use (e.g., in case of numbers, additive or multiplicative?). Usually monoidal values have to be extracted from the container. We need a way to convert the contents of the container to monoidal values, perform the accumulation, and then convert the result to some output type. This could be done, for instance by fist applying \hask{fmap}, and then traversing the result to accumulate monoidal values. For performance reasons, we might prefer the two actions to be done in a single pass.
Here's a data structure that combines two functions, one converting \hask{a} to some monoidal value \hask{m} and the other converting the final result to \hask{b}. The traversal itself should not depend on what monoid is being used so, in Haskell, we use an existential type.
\begin{haskell}
data Fold a b = forall m. Monoid m => Fold (a -> m) (m -> b)
\end{haskell}
The constructor of \hask{Fold} is polymorphic in \hask{m}, so it can be instantiated for any monoid, but the client of \hask{Fold} will have no idea what the monoid was.
The simplest container to traverse is a list and, indeed, we can use a \hask{Fold} to fold a list. Here's the inefficient, but easy to understand implementation
\begin{haskell}
fold :: Fold a b -> [a] -> b
fold (Fold s g) = g . mconcat . fmap s
\end{haskell}
See Gabriel's blog post for a more efficient implementation.
A \hask{Fold} is a functor
\begin{haskell}
instance Functor (Fold a) where
fmap f (Fold scatter gather) = Fold scatter (f . gather)
\end{haskell}
In fact it's a \hask{Monoidal} functor (in category theory, it's called a lax monoidal functor)
\begin{haskell}
class Monoidal f where
init :: f ()
combine :: f a -> f b -> f (a, b)
\end{haskell}
\begin{haskell}
instance Monoidal (Fold a) where
-- Fold a ()
init = Fold bang id
-- Fold a b -> Fold a c -> Fold a (b, c)
combine (Fold s g) (Fold s' g') = Fold (tuple s s') (bimap g g')
\end{haskell}
where we used the following helper functions
\begin{haskell}
bang :: a -> ()
bang _ = ()
tuple :: (c -> a) -> (c -> b) -> (c -> (a, b))
tuple f g = \c -> (f c, g c)
\end{haskell}
A monoidal functor is automatically applicative---a property that can be used to aggregate \hask{Fold}s.
A list is the simplest example of a recursive data structure. The immediate question is, can we use \hask{Fold} with other recursive data structures? The generalization of folding for recursively-defined data structures is called a catamorphism. What we need is a \emph{monoidal} catamorphisms.
\section{Algebras and catamorphisms}
Here's a very short recap of simple recursion schemes (for more, see my \href{https://bartoszmilewski.com/2013/06/10/understanding-f-algebras/}{blog}). An algebra for a functor \hask{f} with the carrier \hask{a} is defined as
\begin{haskell}
type Algebra f a = f a -> a
\end{haskell}
A fixed point of a functor is the carrier of its initial algebra
\begin{haskell}
newtype Fix f = Fix { unFix :: f (Fix f) }
\end{haskell}
A catamorphism generalizes a fold
\begin{haskell}
cata :: Functor f => Algebra f a -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix
\end{haskell}
\section{Monoidal algebras}
We would like to use a \hask{Fold} to fold an arbitrary recursive data structure. We are interested in data structures that store values of type \hask{a} which can be converted to monoidal values. Such structures are generated by functors of two arguments (bifunctors).
\begin{haskell}
class Bifunctor f where
bimap :: (a -> a') -> (b -> b') -> f a b -> f a' b'
\end{haskell}
In our case, the first argument will be the payload and the second, the placeholder for recursion.
We start by defining a monoidal algebra for such a functor by assuming that it has a monoidal payload, and that the child nodes have already been evaluated to a monoidal value
\begin{haskell}
type MAlgebra f = forall m. Monoid m => f m m -> m
\end{haskell}
A monoidal algebra is polymorphic in the monoid \hask{m} reflecting the requirement that the evaluation should only be allowed to use monoidal unit and multiplication.
A bifunctor is automatically a functor in its second argument
\begin{haskell}
instance Bifunctor f => Functor (f a) where
fmap g = bimap id g
\end{haskell}
We can apply the fixed point to this functor to define a recursive data structure \hask{Fix (f a)}.
We can then use \hask{Fold} to convert the payload of this data structure to monoidal values, and then apply a catamorphism to fold it
\begin{haskell}
cat :: Bifunctor f => MAlgebra f -> Fold a b -> Fix (f a) -> b
cat malg (Fold s g) = g . cata alg
where
alg = malg . bimap s id
\end{haskell}
We have factorized the original problem in three orthogonal directions: the monoidal algebra, the \hask{Fold}, and the traversal of the particular recursive data structure.
\section{Example}
Here's a simple example. We define a bifunctor that generates a binary tree with arbitrary payload \hask{a} stored at the leaves
\begin{haskell}
data TreeF a r = Leaf a | Node r r
\end{haskell}
It is indeed a bifunctor
\begin{haskell}
instance Bifunctor TreeF where
bimap f g (Leaf a) = Leaf (f a)
bimap f g (Node r r') = Node (g r) (g r')
\end{haskell}
The recursive tree is generated as its fixed point
\begin{haskell}
type Tree a = Fix (TreeF a)
\end{haskell}
We define two smart constructors to simplify the construction of trees
\begin{haskell}
leaf :: a -> Tree a
leaf a = Fix (Leaf a)
node :: Tree a -> Tree a -> Tree a
node t t' = Fix (Node t t')
\end{haskell}
We can define a monoidal algebra for this functor. Notice that it only uses monoidal operations (we don't even need the monoidal unit here, since values are stored in the leaves). It will therefore work for any monoid
\begin{haskell}
myAlg :: MAlgebra TreeF
myAlg (Leaf m) = m
myAlg (Node m m') = m <> m'
\end{haskell}
Separately, we define a \hask{Fold} whose internal monoid is \hask{Sum Int}. It converts \hask{Double} values to this monoid using \hask{floor}, and converts the result to a \hask{String} using \hask{show}
\begin{haskell}
myFold :: Fold Double String
myFold = Fold floor' show'
where
floor' :: Double -> Sum Int
floor' = Sum . floor
show' :: Sum Int -> String
show' = show . getSum
\end{haskell}
This \hask{Fold} has no knowledge of the data structure we'll be traversing.
Here's a small tree containing three \hask{Double}s
\begin{haskell}
myTree :: Tree Double
myTree = node (node (leaf 2.3) (leaf 10.3)) (leaf 1.1)
\end{haskell}
We can monoidally fold this tree and display the resulting \hask{String}
\begin{haskell}
> cat myAlg myFold myTree
> "13"
\end{haskell}
Notice that we can use the same monoidal catamorphism with any monoidal algebra and any \hask{Fold}.
The following pragmas were used in this program
\begin{haskell}
{-# language ExistentialQuantification #-}
{-# language RankNTypes #-}
{-# language FlexibleInstances #-}
{-# language IncoherentInstances #-}
\end{haskell}
\section{Relation to Optics}
A \hask{Fold} can be seen as a form of optic. It takes a source type, extracts a monoidal value from it, and maps a monoidal value to the target type; all the while keeping the monoid existential. Existential types are represented in category theory as coends---here we are dealing with a coend over the category of monoids $\Cat{Mon}(\Cat{C})$ in some monoidal category $\Cat C$. There is an obvious forgetful functor $U$ that forgets the monoidal structure and produces an object of $\Cat C$. Here's the categorical formula that corresponds to \hask{Fold}
\[\int^{m \in Mon(C)} C(s, U m)\times C(U m, t)\]
This coend is taken over a profunctor in the category of monoids
\[P n m = C(s, U m) \times C(U n, t)\]
The coend is defined as a disjoint union of sets $P m m$ in which we identify some of the elements. Given a monoid homomorphism $f \colon m \to n$, and a pair of morphisms
\[u \colon s \to U m\]
\[v \colon U n \to t\]
we identify the pairs
\[((U f) \circ u, v) \sim (u, v \circ (U f))\]
This is exactly what we need to make our monoidal catamorphism work. This condition ensures that the following two scenarios are equivalent:
\begin{itemize}
\item Use the function $u$ to extract monoidal values, transform these values to another monoid using $f$, do the folding in the second monoid, and translate the result using $v$
\item Use the function $u$ to extract monoidal values, do the folding in the first monoid, use $f$ to transform the result to the second monoid, and translate the result using $v$
\end{itemize}
Since the monoidal catamorphism only uses monoidal operations and $f$ is a monoid homomorphism, this condition is automatically satisfied.
\end{document}