2014-06-29

Implicit mapping

The last post was about what happens when one tries to apply a function to a value that is too small (not wrapped in an applicative functor). This post is about what happens when the value is too big (wrapped in too many functors). Jakub Marian defines (when translated into my notation):
\[\begin{aligned}
(f\ g)\ x &{}= f\ (g\ x) \\
f\ A &{}= \{f\ x : x\in A\}
\end{aligned}\]I posted the applicative function definition in the last post, but here we only need a regular functor. As far as I understand it, a functor is what comes directly from category theory, and an applicative functor just adds some functions that they nearly always have. There are very few functors that cannot be made into applicative functors.

\(
\newcommand{\fmap}{\text{fmap}}
\)A functor \(f\) provides us with just:
\[
\fmap : (a\rightarrow b)\rightarrow f\ a\rightarrow f\ b
\]For sets, \(\fmap\) applies a function to every element of the set (eliminating duplicates afterwards), and for functions, it composes the given function (type \((a\rightarrow b)\)) to the left of the functor's function (type \(f\ a\), which is \(t\rightarrow a\)). I encourage you to see why the types work for the latter example.

\(
\newcommand{\neg}{\text{neg}}
\newcommand{\R}{\mathbb R}
\)So, post done? Pretty much. It's maybe worth going through the example of \(\neg\ {}_\cdot^{\neg\ 1}D\), where \(\neg : \R\rightarrow\R\) is the negation function and \(D : (\R\rightarrow\R)\rightarrow \R\rightarrow\R\) is the differentiation function (and the superscript-subscript notation is described in the initial post). Of course, limiting to real numbers is arbitrary; I just needed a number type.

Firstly, we try to apply \(\neg : \R\rightarrow\R\) to \({}_\cdot^{\neg\ 1} : \R\rightarrow\R\). \(\R\rightarrow\R\) doesn't match \(\R\), but it does match \(f\ \R\), where \(f = (a\rightarrow)\). So, we transform \(\neg\ {}_\cdot^{\neg\ 1}\) into \(\fmap\ \neg\ {}_\cdot^{\neg\ 1}\), which is \(\neg\circ{}_\cdot^{\neg\ 1}\), the negative reciprocal function. Let's call it \(\text{negrec} : \R\rightarrow\R\).

Next, we try to apply that to \(D : (\R\rightarrow\R)\rightarrow \R\rightarrow\R\). \(D\)'s type certainly doesn't match the expected \(\R\), but it does match \(h\ (g\ \R)\), where \(g = (b\rightarrow)\) (where \(b = \R\)) and \(h = (c\rightarrow)\) (where \(c = \R\rightarrow\R\)). Note that it doesn't match \(i\ \R\) where \(i = ((\R\rightarrow\R)\rightarrow\R\rightarrow)\) because \(\rightarrow\) is not associative (it binds to the right). So, \(\text{negrec}\ D\) becomes \(\fmap\ (\fmap\ \text{negrec})\ D\), or \((\text{negrec}\circ{})\circ D\).

That's as far as we can go, so let's apply this function to, say, \(\exp\) and \(0\). \[\begin{aligned}
((\text{negrec}\circ{})\circ D)\ \exp\ 0
&{}= ((\text{negrec}\circ{})\ (D\ \exp))\ 0 \\
&{}= (\text{negrec}\circ\exp)\ 0 \\
&{}= \text{negrec}\ (\exp 0) \\
&{}= \text{negrec}\ 1 \\
&{}= \neg\ 1
\end{aligned}\]And that tells us that the gradient of the normal to the exponential curve at 0 is negative 1.

Implicit value lifting

Jakub Marian's notation without variables has a definition I call “the fork rule”:
\[
(f\star g)\ x = f\ x\star g\ x
\]It is also made clear that if either \(f\) or \(g\) is a constant, the constant should be preserved. For instance, the following has to hold:
\[
({\large\iota}+1)\ 2 = {\large\iota}\ 2+1\ 2 = 2 + 1 = 3
\]where \(\large\iota\) is the identity function. Obviously, the thing we need to sort out is \(1\ 2=1\).

My first idea was that all constants should be turned into constant functions. Essentially, “1” becomes a function that always yields the value 1 (but not the function 1). However, I realised that this was unsatisfactory. Of minor importance was the fact that it felt weird to never have access to the underlying object. But the main point is that I want to generalise the ideas of constants and constant functions to elements and single-element sets. If every constant were a function, it would also need to be a set, and a zip list, and every other similar structure we want to use. There would have to be some order to the wrapping of all of these structures, but there is no natural order.

So instead I say this: values can be implicitly lifted when applied to a function that expects an applicative functor. As an introduction, the applicative functor \(f : \text{Type} \rightarrow \text{Type}\) has definitions for these functions:
\[\begin{aligned}
\text{fmap}\ &: (a\rightarrow b)\rightarrow f\ a\rightarrow f\ b \\
\text{pure}\ &: a\rightarrow f\ a \\
{<}{\ast}{>}\ &: f\ (a\rightarrow b)\rightarrow f\ a\rightarrow f\ b
\end{aligned}\]So \(f\) acts like a wrapper on other types, an idea played on by this functor/applicative/monad introduction. We won't need monads here (yet), but they are related.

For this post, we're only interested in \(\text{pure}\). For \(\text{Set}\), \(\text{pure}\ a = \{a\}\), and for types \((t\rightarrow)\) (functions from arbitrary type \(t\)), \(\text{pure}\ a = \lambda x.a\). So there we have our singleton set and constant function connected. (Note for functional programmers: sets are indeed functors in maths, since any things of the same type in maths can be equated. Mathematicians don't care for such worthless notions as computability.)

To address \(1\ 2 = 1\), we'll write it out with an explicit application operator, \($\). So, we consider \(1\mathbin$2\). \($\) has type \((a\rightarrow b)\rightarrow a\rightarrow b\), so we're hoping that \(1\) fits \(a\rightarrow b\) and \(2\) fits \(a\). \(2\) does indeed fit \(a\) (anything does), so we need \(1\) to fit \(\mathbb{R}\rightarrow b\). It doesn't. But it does fit \(b\), and \((\mathbb{R}\rightarrow)\) is an applicative functor. So we transform \(1\mathbin$2\) into \(\text{pure}\ 1\mathbin$2\), which is \((\lambda x.1)\ 2\), which is \(1\). Yay!

So we can now use constants where functions were wanted, elements where sets were wanted and values where maybes were wanted. That seems enough for this post.

2014-06-28

J meets Haskell, for maths

This is the introductory post on a few of my ideas for mathematical notation. Some are more useful or realistic than others (with no particular correlation between the two). It's based on Jakub Marian's notation without variables, so big thanks to Jakub.

Here are some ways in which my notation differs from Jakub's:
  • Juxtaposition means function application. So it's \(f\,x\) as much as it's \(\sin x\). This is the only feasible notation for application of curried functions, other than having some explicit left-associative infix operator. And because I plan on using functions a lot (as JM does), brevity of application is important. Function application is the closest-binding operation.
  • Juxtaposition doesn't mean multiplication. So it's \(2\cdot x\) as much as it's \(2\cdot 3\). There's nothing special and fundamental about multiplication, so it doesn't have particularly special notation.
  • Implicit mapping is better-defined. When applying a function to some functor that it doesn't handle, the function is \(\text{fmap}\)ed until it can match its argument. Otherwise, it's not \(\text{fmap}\)ed at all. There's a bit more to it, and it probably deserves its own post.
  • Implicit mapping is controlled via subscripted numbers before the function. The numbers work in the same way as array ranks in J (the " conjunction), but also work for mapping over other functors, like functions. It's rare to actually use these, but they're there for anyone who needs them.
  • Exponentiation is generalised, and the exponent is written before the base. \(_\cdot^2x\) (notice the dot) denotes the square of \(x\), and \(_\circ^\infty\cos 0\) denotes the fixpoint of \(\cos\), starting from \(0\) (\(\circ\) is the explicit function composition operator). The subscript-superscript pair is considered a function, so the usual precedence and mapping rules for function application apply. I haven't decided how this handles non-associative operators.
  • Subtraction and division have their operands switched. I wrote a Quora answer about this, so I won't explain it here, yet. This also requires that both are right-associative (though fixity of division is not used much).
    • This also applies to the \(\bmod\) function, now also written in prefix (like any other function with a name made of letters). \(\bmod 2\ 5=1\). An alternative is to use \(\mathbin\%\) (ugly) or repurpose \(\mathbin|\) (like J). Both of those are written with \mathbin in \(\LaTeX\), which adds spacing.
    • Limits of integration are also swapped.
  • Elision of arguments implies partial application, not use of a default argument. So it's \(2-0\), like it's \(\frac{2}{1}\) (remember, they both got flipped, so \(2-0\) is negative 2). This may seem annoying, but being able to use partially-applied infix operators has its own rewards.
  • Oversetting and undersetting can be used for composition around infix operators. \(x\mathbin{\underset\log+}y=\log x+\log y=\log(x\cdot y)=x\mathbin{\overset\log\cdot}y\). Ironically, this nifty non-linear notation comes from two J conjunctions. This is possibly most useful for defining weaker equality operators, as seen in modular arithmetic (\(5\mathbin{\underset{\bmod 2}=}3\)).
Numerals are separate to everything I've listed here, and are not really relevant. For the record, I prefer balanced base twelve at the moment, but I'm going to try balanced base twenty-four soon. Balanced bases have the advantage, amongst other advantages, that small negative numbers can be written with a single character, rather than three (as I propose).

Of all these, switching of parameters is probably the most jarring and incompatible with standard maths notation. It takes a while to get used to, so I'll try to spare you from it as much as possible on this blog. The other large differences mostly don't clash with existing notation, and others don't have counterparts in the current system.