函数式编程另类指南

来源:百度文库 编辑:神马文学网 时间:2024/06/06 12:28:56
原名:Functional Programming For The Rest of Us
作者:Vyacheslav Akhmechet
原文:http://www.defmacro.org/ramblings/fp.html
翻译:lihaitao,kaiqingliu
程序员拖沓成性,每天到了办公室后,泡咖啡,检查邮箱,阅读 RSS feed,到技术站点查阅最新的文章,在编程论坛的相关版面浏览公共讨论,并一次次地刷新以免漏掉一条信息。然后是午饭,回来后盯了 IDE 没几分钟,就再次检查邮箱,倒咖啡。最后在不知不觉中,结束了一天。
不平凡的事是每隔一段时间会跳出一些很有挑战性的文章。如果找对了地方的话,至少每隔几天就会发现一个这类的文章——很难快速通读它们,于是就束之高阁,直到突然你发现自己已经有了一个长长的链接列表和一个装满了PDF文件的目录,然后你梦想着到一个人迹罕至的森林里的小木屋苦读一年以期赶上,要是每天清晨你沿着那里的林中小溪散步时会有人带来食物和带走垃圾就更好了。
虽然我对你的列表一无所知,但我的列表却是一大堆关于函数式编程的文章。而这些基本上是最难阅读的了。它们用枯燥的学院派语言写成,即使“在华尔街行业浸淫十年的专家(veterans)”也不能理解函数式编程(也写作FP)都在探讨些什么。如果你去问花旗集团(Citi Group)或德意志银行(Deutsche Bank)的项目经理[1],为什么选择了 JMS 而不是 Erlang,他们可能回答不能在产业级的应用中使用学院派语言。问题是,一些最为复杂的,有着最严格需求的系统却是用函数式编程元素写成。有些说法不能让人信服。
的确,关于函数式编程的文章和论文难于理解,但他们本来不必这么晦涩。这一知识隔阂的形成完全是历史原因。函数式编程的概念本身并不困难。这篇文章可以作为“简易的函数式编程导引”。是一座从我们命令式(imperative)的思维模式到函数式编程的桥梁。去取杯咖啡回来继续读下去吧。可能你的同事很快就会开始取笑你对函数式编程发表的观点了。
那么什么是函数式编程呢?它从哪里来?它不会是种食物吧?如果它真如其倡导者所言,为什么没有在行业中得到更广泛的使用?为什么好像只有那些拿着博士学位的人才使用它?最要紧的是,为什么它就 TMD 这么难学?这些closure, continuation, currying,惰性求值和无副作用等等究竟是些什么东西?没有大学参与的项目怎么使用它?为什么看上去它与所有美好的圣洁的东西那样格格不入,反而与我们强迫人的秉性那样相近呢?我们将于不久扫清这些疑问。首先让我来解释形成实际生活和学界文章之间巨大隔阂的缘起,简单得像一次公园的散步。
信步游园
启动时间机器,我们散步在两千多年以前的一个被遗忘了太久的明媚的春日,那是公元前380年。雅典城墙外的橄榄树树荫里,柏拉图和一个英俊的奴隶小男孩朝着学院走去。“天气真好”,“饮食不错”,然后话题开始转向哲思。
“瞧那两个学生,”为了使问题更容易理解,柏拉图仔细地挑选着用词,“你认为谁更高呢?” 小男孩看着那两个人站着的水漕说,“他们差不多一样高”。柏拉图说:“你的差不多一样是什么意思?”。“我在这里看他们是一样高的,不过我肯定如果走近些就会看出他们高度的差别。”。
柏拉图笑了,他正把这个孩子带到正确的方向。“那么你是说,我们这个世界没有完全的等同了?” 小男孩想了一会儿回答,“对,我不这样认为,任何事物总有一些区别,即使我们看不到它。” 这句话非常到位!“那么如果这世上没有完全的相等,你又是如何理解‘完全’相等这个概念的呢?” 小男孩迷惑得说:“我不知道。”
最初尝试着理解数学的本源(nature)时也会产生这种疑惑。柏拉图暗示这个世上的万物都只是一个对完美的近似。他还认识到我们即使没有接触到完美但依然可以理解这一概念。所以他得出结论,完美的数学形式只能存在于另一个世界,我们通过和那个世界的某种联系在一定程度上知晓他们。很明显我们不能看到完美的圆,但我们可以理解什么是完美的圆并用数学公式将它表达出来。那么,什么是数学?为什么宇宙可以用数学定理描述?数学可以描述宇宙中的所有现象吗?[2]
数学哲学是一个很复杂的课题。像大多数哲学学科一样它更倾向于提出问题而不是给出解答。这些意见中很多都循回绕转于一个事实,即数学实际上是一个谜语:我们设置了一系列基本的不冲突的原理和一些可以施加于这些原理的操作规则,然后我们就能堆砌这些规则以形成更复杂的规则。数学家把这种方法叫做“形式系统”或“演算”。如果愿意,我们可以很快写出一个关于 Tetris (译者注:一种通常被称为俄罗斯方块的游戏)的形式系统。实际上,工作中的 Tetris 实现就是一个形式系统,只是被指定使用了个不常见的表现形式。
人马座的那个生物文明也许不能理解我们的 Tetris和圆的范式,因为可能他们只能通过嗅觉器官获得感知。他们也许永远不会发现Tetris范式,但很可能会有一个圆的范式。我们也可能将无法阅读它,因为我们的嗅觉没有那么复杂,可是一旦我们理解(pass)了那一范式的表示形式(通过这种传感器和标准解码技术来理解这种语言),其底层的概念就可被任何智能文明所理解。
有趣的是如果从来没有智能文明存在,Tetris 和圆的范式仍然严密合理,只是没有人注定将会发现他们。如果产生了一种智能文明,他就会发现一些形式系统来帮助描述宇宙的规律。但他还是不大可能发现 Tetris 因为宇宙中再没有和它相似的事物。在现实世界中这类无用的形式系统或迷题的例子数不胜数,Tetris 只是其中的一个典型。我们甚至不能确定自然数是否是对客观世界的完整近似, 至少我们可以简单的想像一个很大的数它不能用宇宙中任何东西描述,因为它以近乎无穷。
历史一瞥[3]
再次启动时间机器,这一次的旅行近了很多,我们回到 1930 年代。大萧条正在蹂躏着那个或新或就的时代。空前的经济下挫影响着几乎所有阶层的家庭生活,只有少数人还能够保持着饥谨危机前的安逸。一些人就如此幸运地位列其中,我们关心的是普林斯顿大学的数学家们。
采用了歌特式风格设计建造的新办公室给普林斯顿罩上天堂般的幸福光环,来自世界各地的逻辑学家被邀请到普林斯顿建设一个新的学部。虽然彼时的美国民众已难能弄到一餐的面包,普林斯顿的条件则是可以在高高的穹顶下,精致雕凿的木质墙饰边上整日的品茶讨论或款款慢步于楼外的林荫之中。
阿隆左·丘奇就是一个在这种近于奢侈的环境中生活着的数学家。他在普林斯顿获得本科学位后被邀留在研究生院继续攻读。阿隆左认为那里的建筑实属浮华,所以他很少一边喝茶一边与人讨论数学,他也不喜欢到林中散步。阿隆左是一个孤独者:因为只有一个人时他才能以最高的效率工作。虽然如此,他仍与一些普林斯顿人保持的定期的联系,其中包括阿兰·图灵,约翰·冯·诺依曼,和 kurt Gödel。
这四个人都对形式系统很感兴趣,而不太留意现实世界,以便致力于解决抽象的数学难题。他们的难题有些共同之处:都是探索关于计算的问题。如果我们有了无限计算能力的机器,哪些问题可以被解决?我们可以使他们自动地得以解决吗?是否还是有些问题无法解决,为什么?不同设计的各种机器是否具有相同的计算能力?
通过和其它人的合作,阿隆左·丘奇提出了一个被称为 lambda 演算的形式系统。这个系统本质上是一种虚拟的机器的编程语言,他的基础是一些以函数为参数和返回值的函数。函数用希腊字母lambda 标识,这个形式系统因此得名[4]。利用这一形式系统,阿隆左就可以对上述诸多问题推理并给出结论性的答案。
独立于阿隆左,阿兰·图灵也在进行着相似的工作,他提出了一个不同的形式系统(现在被称为图灵机),并使用这一系统独立得给出了和阿隆左相似的结论。后来被证明图灵机和 lambda 演算能力等同。
我们的故事本可以到此结束,我会就此歇笔,而你也将浏览到下一个页面,如果第二次世界大战没有在那时打响。整个世界笼罩在战争的火光和硝烟之中,美国陆军和海军前所未有的大量使用炮弹,为了改进炮弹的精确度,部队组织了大批的科学家持续地计算微分方程以解出弹道发射轨迹。渐渐意识到这个任务用人力手工完成太耗精力后,人们开始着手开发各种设备来攻克这个难关。第一个解出了弹道轨迹的机器是 IBM 制造的 Mark I —— 它重达5吨,有75万个组件,每秒可以完成三次操作。
竞争当然没有就此结束,1949年,EDVAC(Electronic Discrete Variable Automatic Computer,爱达瓦克)被推出并获得了极大的成功。这是对冯·诺依曼架构的第一个实践实例,实际上也是图灵机的第一个现实实现。那一年好运与阿隆左·丘奇无缘。
直到1950年代将尽,一位 MIT 的教授John McCarthy(也是普林斯顿毕业生)对阿隆左·丘奇的工作产生了兴趣。1958年,他公开了表处理语言 Lisp。Lisp 是对阿隆左·丘奇的 lambda 演算的实现但同时它工作在冯·诺依曼计算机上!很多计算机科学家认识到了 Lisp 的表达能力。1973年,MIT人工智能实验室的一组程序员开发了被称为Lisp机器的硬件-阿隆左 lambda 演算的硬件实现!
函数式编程
函数式编程是对阿隆左·丘奇理论的实践应用。但也并非全部 lambda 演算都被应用到了实践中,因为 lambda 演算不是被设计为在物理局限下工作的。因此,象面向对象的编程一样,函数式编程是一系列理念,而不是严格的教条。现在有很多种函数式编程语言,他们中的大多数以不同方式完成不同任务。在本文中我将就最广泛使用的源自函数式编程的思想作一解释,并将用Java语言举例。(的确,你可以用Java写出函数式的程序如果你有显著的受虐倾向)。在下面的小节中,我将会把Java作为一种函数式语言,并对其稍加修改使它成为一种可用的函数式语言。现在开始吧。
lambda 演算被设计用来探询关于计算的问题,所以函数式编程主要处理计算,并惊人地用函数来完成这一过程。函数是函数式编程的基本单位,函数几乎被用于一切,包括最简单的计算,甚至变量都由计算取代。在函数式编程中,变量只是表达式的别名(这样我们就不必把所有东西打在一行里)。变量是不能更改的,所有变量只能被赋值一次。用 Java 的术语来说,这意味着所有单一变量都被声明为 final(或 C++ 的 const)。在函数式编程中没有非 final 的变量。
final int i = 5;
final int j = i + 3;
因为函数式编程中所有变量都是 final 的,所以可以提出这样两个有趣的表述:没有必要总是写出关键字 final,没有必要把变量再称为变量。那么现在我们对Java作出两个修改:在我们的函数式 Java 中所有变量默认都是 final 的,我们将变量(variable)称为符号(symbol)。
就此你也许会质疑,用我们新创造的语言还能写出有些复杂度的程序吗?如果每个符号都是不可变更(non-mutalbe)的,那么就无法改变任何状态!其实事实并非完全如此。在阿隆左研究其 lambda 演算时,他并不想将某个状态维护一段时间以期未来对其进行修改。他关注的是对数据的操作(也通常被称为"演算体 caculating stuff")。既然已被证明lambda演算与图灵机等价,它可以完成所有命令式编程语言能够完成的任务。那么,我们怎么才能做到呢?
答案是函数式程序能保存状态,只是它并非通过变量而是使用函数来保存状态。状态保存在函数的参数中,保存在堆栈上。如果你要保存某个状态一段时间并时不时地对其进行一些修改,可以写个递归函数。举个例子,我们写个函数来翻转 Java 的字符串。记住,我们声明的每个变量默认都是final 的。[5]
String reverse(String arg) {
if(arg.length == 0) {
return arg;
}
else {
return reverse(arg.substring(1, arg.length)) + arg.substring(0, 1);
}
}
这个函数很慢因为它不断地调用自己[6],它还也是个嗜内存魔因为要持续分配对象。不过它的确是在用函数式风格。你可能会问,怎么有人会这样写程序?好的,我这就慢慢讲来:
函数式编程的优点
你可能会认为我根本无法对上面那个畸形的函数给出个合理的解释。我开始学习函数式编程时就是这么认为的。不过我是错了。有很好的理由使用这种风格,当然其中一些属主观因素。例如,函数式程序被认为更容易阅读。因为每个街区的孩子都知道,是否容易理解在旁观者的眼中,所以我将略去这些主观方面的理由。幸运的是,还有很多的客观理由。
单元测试
因为函数式编程的每一个符号都是 final 的,没有函数产生过副作用。因为从未在某个地方修改过值,也没有函数修改过在其作用域之外的量并被其他函数使用(如类成员或全局变量)。这意味着函数求值的结果只是其返回值,而惟一影响其返回值的就是函数的参数。
这是单元测试者的梦中仙境(wet dream)。对被测试程序中的每个函数,你只需在意其参数,而不必考虑函数调用顺序,不用谨慎地设置外部状态。所有要做的就是传递代表了边际情况的参数。如果程序中的每个函数都通过了单元测试,你就对这个软件的质量有了相当的自信。而命令式编程就不能这样乐观了,在 Java 或 C++ 中只检查函数的返回值还不够——我们还必须验证这个函数可能修改了的外部状态。
调试
如果一个函数式程序不如你期望地运行,调试也是轻而易举。因为函数式程序的 bug 不依赖于执行前与其无关的代码路径,你遇到的问题就总是可以再现。在命令式程序中,bug 时隐时现,因为在那里函数的功能依赖与其他函数的副作用,你可能会在和 bug 的产生无关的方向探寻很久,毫无收获。函数式程序就不是这样——如果一个函数的结果是错误的,那么无论之前你还执行过什么,这个函数总是返回相同的错误结果。
一旦你将那个问题再现出来,寻其根源将毫不费力,甚至会让你开心。中断那个程序的执行然后检查堆栈,和命令式编程一样,栈里每一次函数调用的参数都呈现在你眼前。但是在命令式程序中只有这些参数还不够,函数还依赖于成员变量,全局变量和类的状态(这反过来也依赖着这许多情况)。函数式程序里函数只依赖于它的参数,而那些信息就在你注视的目光下!还有,在命令式程序里,只检查一个函数的返回值不能够让你确信这个函数已经正常工作了,你还要去查看那个函数作用域外数十个对象的状态来确认。对函数式程序,你要做的所有事就是查看其返回值!
沿着堆栈检查函数的参数和返回值,只要发现一个不尽合理的结果就进入那个函数然后一步步跟踪下去,重复这一个过程,直到它让你发现了 bug 的生成点。
并行
函数式程序无需任何修改即可并行执行。不用担心死锁和临界区,因为你从未用锁!函数式程序里没有任何数据被同一线程修改两次,更不用说两个不同的线程了。这意味着可以不假思索地简单增加线程而不会引发折磨着并行应用程序的传统问题。
事实既然如此,为什么并不是所有人都在需要高度并行作业的应用中采用函数式程序?嗯,他们正在这样做。爱立信公司设计了一种叫作 Erlang 的函数式语言并将它使用在需要极高抗错性和可扩展性的电信交换机上。还有很多人也发现了 Erlang 的优势并开始使用它。我们谈论的是电信通信控制系统,这与设计华尔街的典型系统相比对可靠性和可升级性要求高了得多。实际上,Erlang 系统并不可靠和易扩展,Java 才是。Erlang 系统只是坚如磐石。
关于并行的故事还没有就此停止,即使你的程序本身就是单线程的,那么函数式程序的编译器仍然可以优化它使其运行于多个CPU上。请看下面这段代码:
String s1 = somewhatLongOperation1();
String s2 = somewhatLongOperation2();
String s3 = concatenate(s1, s2);
在函数编程语言中,编译器会分析代码,辨认出潜在耗时的创建字符串s1和s2的函数,然后并行地运行它们。这在命令式语言中是不可能的,因为在那里,每个函数都有可能修改了函数作用域以外的状态并且其后续的函数又会依赖这些修改。在函数式语言里,自动分析函数并找出适合并行执行的候选函数简单的像自动进行的函数内联化!在这个意义上,函数式风格的程序是“不会过时的技术(future proof)”(即使不喜欢用行业术语,但这回要破例一次)。硬件厂商已经无法让CPU运行得更快了,于是他们增加了处理器核心的速度并因并行而获得了四倍的速度提升。当然他们也顺便忘记提及我们的多花的钱只是用在了解决平行问题的软件上了。一小部分的命令式软件和 100% 的函数式软件都可以直接并行运行于这些机器上。
代码热部署
过去要在 Windows 上安装更新,重启计算机是难免的,而且还不只一次,即使是安装了一个新版的媒体播放器。Windows XP 大大改进了这一状态,但仍不理想(我今天工作时运行了 Windows Update ,现在一个烦人的图标总是显示在托盘里除非我重启一次机器)。Unix系统一直以来以更好的模式运行,安装更新时只需停止系统相关的组件,而不是整个操作系统。 即使如此,对一个大规模的服务器应用这还是不能令人满意的。电信系统必须 100% 的时间运行,因为如果在系统更新时紧急拨号失效,就可能造成生命的损失。华尔街的公司也没有理由必须在周末停止服务以安装更新。
理想的情况是完全不停止系统任何组件来更新相关的代码。在命令式的世界里这是不可能的。考虑运行时上载一个Java类并重载一个新的定义,那么所有这个类的实例都将不可用,因为它们被保存的状态丢失了。我们可以着手写些繁琐的版本控制代码来解决这个问题,然后将这个类的所有实例序列化,再销毁这些实例,继而用这个类新的定义来重新创建这些实例,然后载入先前被序列化的数据并希望载入代码可以恰到地将这些数据移植到新的实例。在此之上,每次更新都要重新手动编写这些用来移植的代码,而且要相当谨慎地防止破坏对象间的相互关系。理论简单,但实践可不容易。
对函数式的程序,所有的状态即传递给函数的参数都被保存在了堆栈上,这使的热部署轻而易举!实际上,所有我们需要做的就是对工作中的代码和新版本的代码做一个差异比较,然后部署新代码。其他的工作将由一个语言工具自动完成!如果你认为这是个科幻故事,请再思考一下。多年来 Erlang 工程师一直更新着他们的运转着的系统,而无需中断它。
机器辅助的推理和优化
函数式语言的一个有趣的属性就是他们可以用数学方式推理。因为一种函数式语言只是一个形式系统的实现,所有在纸上完成的运算都可以应用于用这种语言书写的程序。编译器可以用数学理论将转换一段代码转换为等价的但却更高效的代码[7]。多年来关系数据库一直在进行着这类优化。没有理由不能把这一技术应用到常规软件上。
另外,还能使用这些技术来证明部分程序的正确,甚至可能创建工具来分析代码并为单元测试自动生成边界用例! 对稳固的系统这种功能没有价值,但如果你要设计心房脉冲产生器(pace maker)或空中交通控制系统,这种工具就不可或缺。如果你编写的应用程序不是产业的核心任务,这类工具也是你强于竞争对手的杀手锏。
高阶函数
我记得自己在了解了上面列出的种种优点后曾想:“那都是非常好的特性,可是如果我不得不用天生就不健全的语言编程,把一切变量声明为 final 产生的代码将是垃圾一堆。” 这其实是误解。在如Java 这般的命令式语言环境里,将所有变量声明为 final 没有用,但是在函数式语言里不是这样。函数式语言提供了不同的抽象工具它会使你忘记你曾经习惯于修改变量。高阶函数就是这样一种工具。
函数式语言中的函数不同于 Java 或 C 中的函数,而是一个超集——它有着 Java 函数拥有的所有功能,但还有更多。创建函数的方式和 C 中相似:
int add(int i, int j) {
return i + j;
}
这意味着有些东西和同样的 C 代码有区别。 现在扩展我们的 Java 编译器使其支持这种记法。当我们输入上述代码后编译器会把它转换成下面的Java代码(别忘了,所有东西都是 final 的):
class add_function_t {
int add(int i, int j) {
return i + j;
}
}
add_function_t add = new add_function_t();
这里的符号 add 并不是一个函数。这是一个有一个成员函数的很小的类。我们现在可以把 add 作为函数参数放入我们的代码中。还可以把它赋给另一个符号。我们在运行时创建的 add_function_t 的实例如果不再被使用就将会被垃圾回收掉。这些使得函数成为第一级的对象无异于整数或字符串。(作为参数)操作函数的函数被称为高阶函数。别让这个术语吓着你,这和 Java 的 class 操作其它 class(把它们作为参数)没有什么区别。我们本可以把它们称为“高阶类”但没有人注意到这个,因为 Java 背后没有一个强大的学术社区。
那么怎样,何时应该使用高阶函数呢?我很高兴你这样问。如果你不曾考虑类的层次,就可能写出了一整团堆砌的代码块。当你发现其中一些行的代码重复出现,就把他们提取成函数(幸运的是这些依然可以在学校里学到)。如果你发现在那个函数里一些逻辑动作根据情况有变,就把他提取成高阶函数。糊涂了?下面是一个来自我工作的实例:假如我的一些 Java 代码接受一条信息,用多种方式处理它然后转发到其他服务器。
class MessageHandler {
void handleMessage(Message msg) {
// ...
msg.setClientCode("ABCD_123");
// ...
sendMessage(msg);
}
// ...
}
现在假设要更改这个系统,现在我们要把信息转发到两个服务器而不是一个。除了客户端的代码一切都像刚才一样——第二个服务器希望这是另一种格式。怎么处理这种情况?我们可以检查信息的目的地并相应修改客户端代码的格式,如下:
class MessageHandler {
void handleMessage(Message msg) {
// ...
if(msg.getDestination().equals("server1") {
msg.setClientCode("ABCD_123");
} else {
msg.setClientCode("123_ABC");
}
// ...
sendMessage(msg);
}
// ...
}
然而这不是可扩展的方法,如果加入了更多的服务器,这个函数将线性增长,更新它会成为我的梦魇。面向对象的方法是把MessageHandler作为基类,在导出类中专业化客户代码操作
abstract class MessageHandler {
void handleMessage(Message msg) {
// ...
msg.setClientCode(getClientCode());
// ...
sendMessage(msg);
}
abstract String getClientCode();
// ...
}
class MessageHandlerOne extends MessageHandler {
String getClientCode() {
return "ABCD_123";
}
}
class MessageHandlerTwo extends MessageHandler {
String getClientCode() {
return "123_ABCD";
}
}
现在就可以对每一个服务器实例化一个适合的类。添加服务器的操作变得容易维护了。但对于这么一个简单的修改仍然要添加大量的代码。为了支持不同的客户代码我们创建了两个新的类型!现在我们用高阶函数完成同样的功能:
class MessageHandler {
void handleMessage(Message msg, Function getClientCode) {
// ...
Message msg1 = msg.setClientCode(getClientCode());
// ...
sendMessage(msg1);
}
// ...
}
String getClientCodeOne() {
return "ABCD_123";
}
String getClientCodeTwo() {
return "123_ABCD";
}
MessageHandler handler = new MessageHandler();
handler.handleMessage(someMsg, getClientCodeOne);
没有创建新的类型和新的class层次,只是传入合适的函数作为参数,完成了面向对象方式同样的功能,同时还有一些额外的优点。没有使自己囿于类的层次之中:可以在运行时传入函数并在任何时候以更高的粒度更少的代码修改他们。编译器高效地为我们生成了面向对象的“粘合”代码!除此之外,我们还获得了所有函数式编程的其他好处。当然函数式语言提供的抽象不只这些,高阶函数只是一个开始:
currying
我认识的大多数人都读过“四人帮”的那本设计模式,任何自重的程序员都会告诉你那本书是语言中立的(agnostic),模式在软件工程中是通用的,和使用的语言无关。这个说法颇为高贵,故而不幸的是,有违现实。
函数式编程极具表达能力。在函数式语言中,语言既已达此高度,设计模式就不再是必需,最终你将设计模式彻底消除而以概念编程。适配器(Adapter)模式就是这样的一个例子。(究竟适配器和 Facade 模式区别在哪里?可能有些人需要在这里再多费些篇章)。 一旦语言有了叫作 currying 的技术,这一模式就可以被消除。
适配器模式最有名的是被应用在 Java 的“默认”抽象单元——class上。在函数式编程里,模式被应用到函数。模式带有一个接口并将它转换成另一个对他人有用的接口。这有一个适配器模式的例子:
int pow(int i, int j);
int square(int i)
{
return pow(i, 2);
}
上面的代码把一个整数幂运算接口转换成为了一个平方接口。在学术文章里,这个雕虫小技被叫作currying(得名于逻辑学家Haskell Curry, 他曾将相关的数学理论形式化)。因为在函数式编程中函数(反之如class)被作为参数来回传递,currying 很频繁地被用来把函数调整为更适宜的接口。因为函数的接口是他的参数,使用 currying 可以减少参数的数目(如上例所示)。
函数式语言内建了这一技术。不用手动地创建一个包装了原函数的函数,函数式语言可以为你代劳。同样地,扩展我们的语言,让他支持这个技术:
square = int pow(int i, 2);
这将为我们自动创建出一个有一个参数的函数 square 。他把第二个参数设置为 2 再调用函数 pow 。这行代码会被编译为如下的 Java 代码:
class square_function_t {
int square(int i) {
return pow(i, 2);
}
}
square_function_t square = new square_function_t();
正如你所见,通过简单地创建一个对原函数的包装,在函数式编程中,这就是 currying —— 快速简易创建包装的捷径。把精力集中在你的业务上,让编译器为你写出必要的代码! 什么时候使用currying?这很简单,任何时候你想要使用适配器模式(包装)时。
惰性求值
惰性(或延迟)求值这一技术可能会变得非常有趣一旦我们采纳了函数式哲学。在讨论并行时已经见过下面的代码片断:
String s1 = somewhatLongOperation1();
String s2 = somewhatLongOperation2();
String s3 = concatenate(s1, s2);
在一个命令式语言中求值顺序是确定的,因为每个函数都有可能会变更或依赖于外部状态,所以就必须有序的执行这些函数:首先是 somewhatLongOperation1,然后 somewhatLongOperation2,最后 concatenate,在函数式语言里就不尽然了。
前面提到只要确保没有函数修改或依赖于全局变量,somewhatLongOperation1 和somewhatLongOperation2 可以被并行执行。但是如果我们不想同时运行这两个函数,还有必要保证有序的执行他们呢?答案是不。我们只在其他函数依赖于s1和s2时才需要执行这两个函数。我们甚至在concatenate调用之前都不必执行他们——可以把他们的求值延迟到concatenate函数内实际用到他们的位置。如果用一个带有条件分支的函数替换concatenate并且只用了两个参数中的一个,另一个参数就永远没有必要被求值。在 Haskell 语言中,不确保一切都(完全)按顺序执行,因为 Haskell 只在必要时才会对其求值。
惰性求值优点众多,但缺点也不少。我们会在这里讨论它的优点而在下一节中解释其缺点。
优化
惰性求值有客观的优化潜力。惰性编译器看函数式代码就像数学家面对的代数表达式————可以注销一部分而完全不去运行它,重新调整代码段以求更高的效率,甚至重整代码以降低出错,所有确定性优化(guaranteeing optimizations)不会破坏代码。这是严格用形式原语描述程序的巨大优势————代码固守着数学定律并可以数学的方式进行推理。
抽象控制结构
惰性求值提供了更高一级的抽象,它使得不可能的事情得以实现。例如,考虑实现如下的控制结构:
unless(stock.isEuropean()) {
sendToSEC(stock);
}
我们希望只在祖先不是欧洲人时才执行sendToSEC。如何实现 unless?如果没有惰性求值,我们需要某种形式的宏(macro)系统,但 Haskell 这样的语言不需要它。把他实现为一个函数即可:
void unless(boolean condition, List code) {
if(!condition)
code;
}
注意如果条件为真代码将不被执行。我们不能在一个严格(strict)的语言中再现这种求值,因为 unless 调用之前会先对参数进行求值。
无穷(infinite)数据结构
惰性求值允许定义无穷数据结构,对严格语言来说实现这个要复杂的多。考虑一个Fibonacci 数列,显然我们无法在有限的时间内计算出或在有限的内存里保存一个无穷列表。在严格语言如 Java 中,只能定义一个能返回 Fibonacci 数列中特定成员的 Fibonacci 函数,在 Haskell 中,我们对其进一步抽象并定义一个关于 Fibonacci 数的无穷列表,因为作为一个惰性的语言,只有列表中实际被用到的部分才会被求值。这使得可以抽象出很多问题并从一个更高的层次重新审视他们。(例如,我们可以在一个无穷列表上使用表处理函数)。
缺点
当然从来不存在免费的午餐。惰性求值有很多的缺点,主要就在于,懒。有很多现实世界的问题需要严格求值。例如考虑下例:
System.out.println("Please enter your name: ");
System.in.readLine();
在惰性求值的语言里,不能保证第一行会在第二行之前执行!那么我们就不能进行输入输出操作,不能有意义地使用本地(native)接口(因为他们相互依赖其副作用必须被有序的调用),从而与整个世界隔离。如果引入允许特定执行顺序的原语又将失去数学地推理代码的诸多好处(为此将葬送函数式编程与其相关的所有优点)。幸运的是,并非丧失了一切,数学家为此探索并开发出了许多技巧来保证在一定函数设置下(function setting)代码以一特定的顺序执行。这样我们就赢得了两个世界。这些技术包括 continuation, monad 和 uniqueness typing (一致型别)。我只会在本文中解释continuation,把 monad 和 uniqueness typing 留到将来的文章中。有趣的是,除了确保函数求值顺序, continuation 在很多别的情况下也很有用。这点等一会儿就会提到。
Continuations
Continuations 对于程序设计的意义,就像 《达芬奇密码》对人类历史的意义:即对人类最大秘密的惊人揭示。也许不是,但他在概念上的突破性至少和揭示了负数的平方根意义等同。
我们在学习函数时,只是学到了一半的事实,因为我们基于一个错误的假定:函数只能将结果返回到它的调用函数。在这个意思上,continuation 是广义的函数。函数不必要返回到其调用函数而可以返回到程序的任何地方。我们把 "continuation" 作为参数传给一个函数,它指定了这个函数返回的位置。这个描述可能听起来更加复杂。看一下下面的代码:
int i = add(5, 10);
int j = square(i);
函数 add 在其被调用的位置将结果 15 赋给了 i, 接下来 i 的值被用来调用 square。注意所有的惰性求值编译器都不能调整这几行代码因为第二行依赖着第一行的成功求值。下面用 continuation 风格又称 CPS (Continuation Programming Style) 来重写这段代码,这里函数 add 会将结果返回到 square 而不是原来的调用函数。
int j = add(5, 10, square);
这个例子中 add 有了另一个参数 —— 一个 add 必须在它求值结束时用其返回值调用的函数。这里 square 是 add 的一个 continuation。这两种情况下,j 都将等于 255。
这就是强制使惰性语言有序地求值两个表达式的第一个技巧。考虑下面这个(熟悉的)IO 代码:
System.out.println("Please enter your name: ");
System.in.readLine();
这两行不相依赖所以编译器会自由的重新调整他们的执行顺序。然而,如果我们用 CPS 来重写这段代码,就会有一个依赖,编译器会因此而强制对这两行代码有序执行!
System.out.println("Please enter your name: ", System.in.readLine);
这里 println 需要用自己的返回结果作为参数去调用 readLine 并将 readLine 返回值作为自己的返回值。这样就能确保这两行被有序执行而且 readLine 一定被执行(因为整个计算期望最后的结果为结果)。Java 的 println 返回 void 但如果它返回的是一个抽象值(readLine所期待的),我们就解决了这个问题!当然这样的链接函数调用很快就会使代码难以读懂,不过这个可以避免。比如我们可以给语言添加些语法甜点(syntactic sugar)就可以简单的按顺序输入表达式,然后由编译器自动为我们链接这些函数调用。这样就可以如愿地使用期望的求值顺序并保留一切函数式编程的好处(包括数学地对我们程序进行推理的能力)!如果还是有迷惑,记住函数是只有一个成员的类的实例。重写上述代码使得 println 和 readLine 成为类的实例,这样就对一切都清楚了。
如果我在此结束本节,那将仅仅涉及到 continuation 最浅显的应用。用 CPS 重写整个程序,那里所有的函数都增加一个额外的 continuation 参数并把函数结果传给它。也可以通过简单地把函数当作 continuation 函数(总是返回到调用者的函数)的特殊实例来将程序转为 CPS 风格。这种转换很容易被自动化(事实上,许多编译器就是这么做的)。
一旦我们将一个程序转为了CPS,那么很明显每个指令都将有些 continuation ,这是一个该指令在执行结束时会用其执行结果调用的函数,通常的程序中,这是一个它要返回的地址。从上面的例子中随便举个例子,比如 add(5, 10)。在用CPS风格写的程序里,add 的continuation很明显——这是一个 add 在其执行结束时会调用的函数。那么如果在非CPS的程序里,它是什么呢?当然我们可以把程序转为 CPS ,但有这个必要吗?
其实没有必要。仔细看一下我们的 CPS 转换过程。如果尝试为它写一个编译器,然后经过长期的思考后,你意识到这个 CPS 的版本根本不需要栈!没有函数会以传统的意义“返回”,它只是用结果调用了另一个函数。我们无需在调用时将函数参数压栈再于调用结束时弹出栈,而只是简单的把他们保存在一大块内存中,然后使用跳转指令。不再需要原来的参数——他们不会再次被用到,因为没有函数会返回!
所以,用 CPS 风格写成的程序没有堆栈,但每个函数却有一个额外的参数可被调用。不是 CPS 风格的程序没有可以被调用的这个参数,但却有栈。栈中存放着什么?只是参数和一个指向函数返回地址的指针。你看到光了吗?栈中只是放着 continuation 的信息! 栈中指向返回指令的指针本质上和 CPS 程序里将被调用的函数是等价的。如果你想探究 add(5,10) 的 continuation,只要简单地检查它在堆栈的执行点!
这的确很简单。continuation 和栈上指向返回地址的指针是等价的,只是 continuation 是被显式传递,所以不必和函数被调用点是同一位置。如果还记得 continuation 就是一个函数,并且在我们的语言里,函数被编译为一个类的实例,你就会理解指向栈中返回指令的指针实际就是传递给 continuation 的参数,因为我们的函数(就像一个类的实例)只是一个指针。这意味着给定程序中任意时间和任意位置,你都可以去请求一个当前的 continuation (current continuation)(它就是当前的栈的信息)。
好的,这样我们就知道了什么是 current continuation。他有什么意义?一旦我们得到了当前的 continuation 并将它保存在某处,我们就最终将程序当前的状态保存了下来——及时地冷冻下来。这就像操作系统将其置为休眠状态。一个 continuation 对象里保存了在我们获得它的地方重新启动程序的必要信息。操作系统在每次发生线程间的上下文切换时也是如此。唯一的区别是它保留着全部控制。请求一个 continuation 对象(在Scheme里,可以调用 call-with-current-continuation 函数)后,你就会获得一个包括了当前 continuation 的对象——堆栈(或者在CPS情况下则是下一个要调用的函数)。可以把这个对象保存在一个变量(或者是磁盘)里。当你用这个continuation “重启”程序时,就会转回到处你取得这个对象的那个状态。这就象切换回一个被挂起的线程或唤醒休眠着的操作系统,区别是用 continuation,你可以多次地重复这一过程。当操作系统被唤醒时,休眠信息就被销毁了。但如果那些信息没有被销毁,你也就可以一次次地将它唤醒到同一点,就象重返过去一样。有了 continuation 你就有了这个控制力!
Continuation 应该在什么情况下使用呢?通常在尝试模拟一个本质上是无状态的应用时可以简化你的任务。Continuation 很适合在Web应用程序中使用。微软公司的 ASP.NET 技术极尽苦心地模拟状态以便你在开发 Web 应用时少费周折。可如果 C# 支持了continuation,ASP.NET 的复杂度就可以减半——你只需要保存一个 continuation,当用户下次发出 web 请求时重启它即可。对程序员来说,web 应用程序将不再有中断——程序只是简单的从下一行重启!利用 continuation 这一抽象解决问题真是令人难以置信的便利。考虑到越来越多的胖客户端应用程序正在向服务器端转移,将来 continuation 也会变得越来越重要。
模式匹配
模式匹配不是什么新的创新的特性。事实上,它和函数式编程的关系不大。把产生模式匹配归因于函数式编程的唯一的原因是函数式语言一度提供了模式匹配,然而现在的命令式语言还做不到。
让我们用一个例子深入了解一下模式匹配。这是一个Java的Fibonacci函数:
int fib(int n) {
if(n == 0) return 1;
if(n == 1) return 1;
return fib(n - 2) + fib(n - 1);
}
让我们从Java衍生出的语言来支持模式匹配:
int fib(0) {
return 1;
}
int fib(1) {
return 1;
}
int fib(int n) {
return fib(n - 2) + fib(n - 1);
}
两者有什么区别?编译器为我们实现了分支。
这有什么大不了?的确没什么。有人注意到很多函数包括了复杂的 switch 语句(尤其是在函数式程序中)所以认为这种抽象形式很好。我们把一个函数定义分离成多个,然后把模式置于参数中(有点象重载)。当这个函数被调用时,编译器使其比较参数和其运行时的定义然后选择其中正确的一个。这一般是通过选择可选的最特定的定义来完成。例如,int fib(int n) 可以以 n 等于 1 被调用,但是实际上 fib(n) 没有被调用,因为 fib(1) 更加特定。
模式匹配通常要比我这个例子复杂,比如,高级模式匹配系统可以让我们这样做:
int f(int n < 10) { ... }
int f(int n) { ... }
模式匹配什么时候适用?情况太多了!每当你有一个嵌套着 if 的复杂的数据结构,这时就可以用模式匹配以更少的代码完成得更好。一个很好的例子闪现在我脑海,这就是所有 Win32 平台都提供了的标准的 WinProc 函数(即使它通常被抽象了)。通常模式匹配系统能检测集合也可以应付简单的值。例如,当传给函数一个数组后,就可以找出所有首元素为 1 第三个元素大于 3 的所有数组。
模式匹配还有一个好处即如果需要增加或修改条件,那么不必对付一个巨大的函数。只需增加或修改适合的定义即可。这消除了“四人帮”(GoF)书中的一大类设计模式。条件越复杂,模式匹配就越有用。一旦习惯了它,你就会担心没有了模式匹配的日子如何打发。
Closures
到此我们已经讨论了纯的函数式语言——实现了lambda演算又不包括与丘奇形式系统矛盾的语言——环境里的特性,可是还有很多在lambda演算框架之外的函数语言的有用特征。虽然一个公理系统的实现可以让我们象数学表达式那样思考程序但它未必是实际可行的。许多语言选择去合并一些函数式的元素而没有严格的坚持函数式的教条。很多象这样的语言(如Common Lisp)不要求变量是 final 的——可以即处对其修改。他们还不要求函数只依赖于其参数——允许函数访问外部状态。但这些语言也的确包含着函数式的特征——如高阶函数,在非纯粹的函数式语言里传递函数作为参数和限制在 lambda 演算系统中的作法有些不同,它需要一种常被称为词法(lexical)closure 的有趣特性。下面我给出几个例子。记住,这里变量不再是final的,函数可以引用其作用域外的变量:
Function makePowerFn(int power) {
int powerFn(int base) {
return pow(base, power);
}
return powerFn;
}
Function square = makePowerFn(2);
square(3); // returns 9
函数 make-power-fn 返回了一个函数,它有一个参数,并对这个参数进行一定阶的幂运算。如果对 square(3) 求值会有什么结果?变量 power 不在 powerFn 的作用域中,因为 makePowerFn 已经返回它的栈桢而不复存在。那么square如何工作?一定是这个语言以某种方式将power的值保存了起来以便 square 使用。如果我们再新建一个函数cube,用来计算参数的立方又会怎样?运行环境必须存储两个power的拷贝,每个我们用 make-power-fn 生成的函数都用一个拷贝。保存这些值的现象就被称为 closure。 closure 不只保存宿主函数的参数。例如,closure可能会是这样:
Function makeIncrementer() {
int n = 0;
int increment() {
return ++n;
}
}
Function inc1 = makeIncrementer();
Function inc2 = makeIncrementer();
inc1(); // returns 1;
inc1(); // returns 2;
inc1(); // returns 3;
inc2(); // returns 1;
inc2(); // returns 2;
inc2(); // returns 3;
运行时已保存了n,所以递增器可以访问它,而且运行时为每个递增器都保存了一个 n 的拷贝,即使这些拷贝本应在 makeIncrementer 返回时消失。这些代码被如何编译?closure 在底层是如何工作的?很幸运,我们可以去幕后看看。
一点常识会很有帮助,首先会注意到的是局部变量的生命期不再由简单的作用域限定而是不确定的。那么显然可以由此得出结论它们不再被保存在栈上——反之必须被保存在堆上[8]。这样一来,closure 的实现就象我们前面讨论的函数一样了,只是它还有一个指向周围变量的引用。
class some_function_t {
SymbolTable parentScope;
// ...
}
当一个 closure 引用了一个不在其作用域的变量时,它会在其祖先作用域中查找这个引用。就是这样!Closure 将函数式和面向对象的世界紧密结合。当你创建了一个包含了一些状态的类并把它传到别处时,考虑一下 closure。Closure 就是这样在取出作用域中的变量的同时创建“成员变量”,所以你不必亲自去做这些!
下一步的计划?
关于函数式编程,本文作了浅显地讨论。有时候一次粗浅的射猎可能会进展为重大的收获于我也受益匪浅。将来我还计划写写 category 理论,monad,函数式数据结构,函数式语言中的类型(type)体系,函数式并发,函数式数据库等等还有很多。如果我得以(在学习的过程中)写出了上述诸多主题中的一半,我的生命就会完整了。还有,Google 是我们的朋友。
评论?
如果你有任何问题,意见或建议,请发到邮箱coffeemug@gmail.com。很高兴收到你的反馈。
===========================
[1] 我在2005年找工作时常常提出这个问题,当时我得到的是数量可观的一脸茫然。想像一下,这些人至少每人会得到30万美元,如果他们理解了他们可以得到的大部分工具。
[2] 这像是个悖论。物理学家和数学家被迫确认他们还不完全清楚是否宇宙万物遵循着可以被数学描述的规则。
[3] 我一直厌恶提供了一堆枯燥的日期,人名和地点的纪年式历史课。对我而言,历史是改变了这个世界的人的生活,是他们行为之后的个人动机,是他们得以影响亿万生灵的体制。所以这个关于历史的小节注定无法完整,只讨论了于此关系及其密切的人物与事件。
[4] 我在学习函数式编程的时候,很不喜欢术语lambda,因为我没有真正理解它的意义。在这个环境里,lambda 是一个函数,那个希腊字母只是方便书写的数学记法。每当你听到 lambda 时,只要在脑中把它翻译成函数即可。
[5] 有趣的是 Java 的字符串是不可变更的,探讨这一离经叛道的设计的原因也非常有趣,不过在这里会分散我们对原目标的注意力
[6] 大多数函数式编程语言的编译器能通过将递归尽可能转为迭代来进行优化,这被称为尾递归。
[7] 相反未必成立,虽然有时可以证明两端代码等价,但这不是所有情况下都成立。
[8] 这实际上不比存储在栈上慢,因为一旦引入了垃圾回收器,内存分配就成为了一个O(1)的操作。
from:   http://rl.rockiestech.com/node/172
ps: FPL就是未来……
附带原文
Functional Programming For The Rest of Us
Monday, June 19, 2006
Programmers are procrastinators. Get in, get some coffee, check the mailbox, read the RSS feeds, read the news, check out latest articles on techie websites, browse through political discussions on the designated sections of the programming forums. Rinse and repeat to make sure nothing is missed. Go to lunch. Come back, stare at the IDE for a few minutes. Check the mailbox. Get some coffee. Before you know it, the day is over.
The only thing, every once in a while challenging articles actually do pop up. If you're looking at the right places you'll find at least one of these every couple of days. These articles are hard to get through and take some time, so they start piling up. Before you know it, you have a list of links and a folder full of PDF files and you wish you had a year in a small hut in the middle of the forest with nobody around for miles so you could catch up. Would be nice if someone came in every morning while you're taking a walk down the river to bring some food and take out the garbage.
I don't know about your list, but a large chunk of the articles in mine are about functional programming. These generally are the hardest to get through. Written in a dry academic language, even the "ten year Wall Street industry veterans" don't understand what functional programming (also referred to as FP) articles are all about. If you ask a project manager in Citi Group or in Deutsche Bank1 why they chose to use JMS instead of Erlang they'll say they can't use academic languages for industrial strength applications. The problem is, some of the most complex systems with the most rigid requirements are written using functional programming elements. Something doesn't add up.
It's true that FP articles and papers are hard to understand, but they don't have to be. The reasons for the knowledge gap are purely historical. There is nothing inherently hard about FP concepts. Consider this article "an accessible guide to FP", a bridge from our imperative minds into the world of FP. Grab a coffee and keep on reading. With any luck your coworkers will start making fun of you for your FP comments in no time.
So what is FP? How did it come about? Is it edible? If it's as useful as its advocates claim, why isn't it being used more often in the industry? Why is it that only people with PhDs tend to use it? Most importantly, why is it so damn hard to learn? What is all this closure, continuation, currying, lazy evaluation and no side effects business? How can it be used in projects that don't involve a university? Why does it seem to be so different from everything good, and holy, and dear to our imperative hearts? We'll clear this up very soon. Let's start with explaining the reasons for the huge gap between the real world and academic articles. The answer is as easy as taking a walk in the park.
Fire up the time machine. Our walk in the park took place more than two thousand years ago, on a beautiful sunny day of a long forgotten spring in 380 B.C. Outside the city walls of Athens, under the pleasant shade of olive trees Plato was walking towards the Academy with a beautiful slave boy. The weather was lovely, the dinner was filling, and the conversation turned to philosophy.
"Look at these two students", said Plato carefully picking words to make the question educational. "Who do you think is taller?" The slave boy looked towards the basin of water where two men were standing. "They're about the same height", he said. "What do you mean 'about the same'?", asked Plato. "Well, they look the same from here but I'm sure if I were to get closer I'd see that there is some difference."
Plato smiled. He was leading the boy in the right direction. "So you would say that there is nothing perfectly equal in our world?" After some thinking the boy replied: "I don't think so. Everything is at least a little different, even if we can't see it." The point hit home! "Then if nothing is perfectly equal in this world, how do you think you understand the concept of 'perfect' equality?" The slave boy looked puzzled. "I don't know", he replied.
So was born the first attempt to understand the nature of mathematics. Plato suggested that everything in our world is just an approximation of perfection. He also realized that we understand the concept of perfection even though we never encountered it. He came to conclusion that perfect mathematical forms must live in another world and that we somehow know about them by having a connection to that "alternative" universe. It's fairly clear that there is no perfect circle that we can observe. But we also understand what a perfect circle is and can describe it via equations. What is mathematics, then? Why is the universe described with mathematical laws? Can all of the phenomena of our universe be described by mathematics?2
Philosophy of mathematics is a very complex subject. Like most philosophical disciplines it is far more adept at posing questions rather than providing answers. Much of the consensus revolves around the fact that mathematics is really a puzzle: we set up a set of basic non-conflicting principles and a set of rules on how to operate with these principles. We can then stack these rules together to come up with more complex rules. Mathematicians call this method a "formal system" or a "calculus". We can effectively write a formal system for Tetris if we wanted to. In fact, a working implementation of Tetris is a formal system, just specified using an unusual representation.
A civilization of furry creatures on Alpha Centauri would not be able to read our formalisms of Tetris and circles because their only sensory input might be an organ that senses smells. They likely will never find out about the Tetris formalism, but they very well might have a formalism for circles. We probably wouldn't be able to read it because our sense of smell isn't that sophisticated, but once you get past the representation of the formalism (via various sensory instruments and standard code breaking techniques to understand the language), the concepts underneath are understandable to any intelligent civilization.
Interestingly if no intelligent civilization ever existed in the universe the formalisms for Tetris and circles would still hold water, it's just that nobody would be around to find out about them. If an intelligent civilization popped up, it would likely discover some formalisms that help describe the laws of our universe. They also would be very unlikely to ever find out about Tetris because there is nothing in the universe that resembles it. Tetris is one of countless examples of a formal system, a puzzle, that has nothing to do with the real world. We can't even be sure that natural numbers have full resemblance to the real world, after all one can easily think of a number so big that it cannot describe anything in our universe since it might actually turn out to be finite.
3
Let's shift gears in our time machine. This time we'll travel a lot closer, to the 1930s. The Great Depression was ravaging the New and the Old worlds. Almost every family from every social class was affected by the tremendous economic downturn. Very few sanctuaries remained where people were safe from the perils of poverty. Few people were fortunate enough to be in these sanctuaries, but they did exist. Our interest lies in mathematicians in Princeton University.
The new offices constructed in gothic style gave Princeton an aura of a safe haven. Logicians from all over the world were invited to Princeton to build out a new department. While most of America couldn't find a piece of bread for dinner, high ceilings, walls covered with elaborately carved wood, daily discussions by a cup of tea, and walks in the forest were some of the conditions in Princeton.
One mathematician living in such lavish lifestyle was a young man named Alonzo Church. Alonzo received a B.S. degree from Princeton and was persuaded to stay for graduate school. Alonzo felt the architecture was fancier than necessary. He rarely showed up to discuss mathematics with a cup of tea and he didn't enjoy the walks in the woods. Alonzo was a loner: he was most productive when working on his own. Nevertheless Alonzo had regular contacts with other Princeton inhabitants. Among them were Alan Turing, John von Neumann, and Kurt G枚del.
The four men were interested in formal systems. They didn't pay much heed to the physical world, they were interested in dealing with abstract mathematical puzzles instead. Their puzzles had something in common: the men were working on answering questions about computation. If we had machines that had infinite computational power, what problems would we be able to solve? Could we solve them automatically? Could some problems remain unsolved and why? Would various machines with different designs be equal in power?
In cooperation with other men Alonzo Church developed a formal system called lambda calculus. The system was essentially a programming language for one of these imaginary machines. It was based on functions that took other functions as parameters and returned functions as results. The function was identified by a Greek letter lambda, hence the system's name4. Using this formalism Alonzo was able to reason about many of the above questions and provide conclusive answers.
Independently of Alonzo Church, Alan Turing was performing similar work. He developed a different formalism (now referred to as the Turing machine), and used it to independently come to similar conclusions as Alonzo. Later it was shown that Turing machines and lambda calculus were equivalent in power.
This is where the story would stop, I'd wrap up the article, and you'd navigate to another page, if not for the beginning of World War II. The world was in flames. The U.S. Army and Navy used artillery more often than ever. In attempts to improve accuracy the Army employed a large group of mathematicians to continuously calculate differential equations required for solving ballistic firing tables. It was becoming obvious that the task was too great for being solved manually and various equipment was developed in order to overcome this problem. The first machine to solve ballistic tables was a Mark I built by IBM - it weighed five tons, had 750,000 parts and could do three operations per second.
The race, of course, wasn't over. In 1949 an Electronic Discrete Variable Automatic Computer (EDVAC) was unveiled and had tremendous success. It was a first example of von Neumann's architecture and was effectively a real world implementation of a Turing machine. For the time being Alonzo Church was out of luck.
In late 1950s an MIT professor John McCarthy (also a Princeton graduate) developed interest in Alonzo Church's work. In 1958 he unveiled a List Processing language (Lisp). Lisp was an implementation of Alonzo's lambda calculus that worked on von Neumann computers! Many computer scientists recognized the expressive power of Lisp. In 1973 a group of programmers at MIT's Artificial Intelligence Lab developed hardware they called a Lisp machine - effectively a native hardware implementation of Alonzo's lambda calculus!
Functional programming is a practical implementation of Alonzo Church's ideas. Not all lambda calculus ideas transform to practice because lambda calculus was not designed to work under physical limitations. Therefore, like object oriented programming, functional programming is a set of ideas, not a set of strict guidelines. There are many functional programming languages, and most of them do many things very differently. In this article I will explain the most widely used ideas from functional languages using examples written in Java (yes, you could write functional programs in Java if you felt particularly masochistic). In the next couple of sections we'll take Java as is, and will make modifications to it to transform it into a useable functional language. Let's begin our quest.
Lambda calculus was designed to investigate problems related to calculation. Functional programming, therefore, primarily deals with calculation, and, surprisingly, uses functions to do so. A function is a very basic unit in functional programming. Functions are used for almost everything, even the simplest of calculations. Even variables are replaced with functions. In functional programming variables are simply aliases for expressions (so we don't have to type everything on one line). They cannot be modified. All variables can only be assigned to once. In Java terms this means that every single variable is declared as final (or const if we're dealing with C++). There are no non-final variables in FP.
final int i = 5;final int j = i + 3;
Since every variable in FP is final two fairly interesting statements can be made. It does not make sense to always write the keyword final and it does not make sense to call variables, well... variables. We will now make two modifications to Java: every variable declared in our functional Java will be final by default, and we will refer to variables as symbols.
By now you are probably wondering how you could possibly write anything reasonably complicated in our newly created language. If every symbol is non-mutable we cannot change the state of anything! This isn't strictly true. When Alonzo was working on lambda calculus he wasn't interested in maintaining state over periods of time in order to modify it later. He was interested in performing operations on data (also commonly referred to as "calculating stuff"). However, it was proved that lambda calculus is equivalent to a Turing machine. It can do all the same things an imperative programming language can. How, then, can we achieve the same results?
It turns out that functional programs can keep state, except they don't use variables to do it. They use functions instead. The state is kept in function parameters, on the stack. If you want to keep state for a while and every now and then modify it, you write a recursive function. As an example, let's write a function that reverses a Java string. Remember, every variable we declare is final by default5.
String reverse(String arg) {if(arg.length == 0) {return arg;}else {return reverse(arg.substring(1, arg.length)) + arg.substring(0, 1);}}
This function is slow because it repeatedly calls itself6. It's a memory hog because it repeatedly allocates objects. But it's functional in style. You may be interested why someone would want to program in this manner. Well, I was just about to tell you.
You're probably thinking that there's no way I can rationalize the monstrosity of a function above. When I was learning functional programming I was thinking that too. I was wrong. There are very good arguments for using this style. Some of them are subjective. For example, people claim that functional programs are easier to understand. I will leave out these arguments because every kid on the block knows that ease of understanding is in the eye of the beholder. Fortunately for me, there are plenty of objective arguments.
Unit Testing
Since every symbol in FP is final, no function can ever cause side effects. You can never modify things in place, nor can one function modify a value outside of its scope for another function to use (like a class member or a global variable). That means that the only effect of evaluating a function is its return value and the only thing that affects the return value of a function is its arguments.
This is a unit tester's wet dream. You can test every function in your program only worrying about its arguments. You don't have to worry about calling functions in the right order, or setting up external state properly. All you need to do is pass arguments that represent edge cases. If every function in your program passes unit tests you can be a lot more confident about quality of your software than if you were using an imperative language. In Java or C++ checking a return value of a function is not sufficient - it may modify external state that we would need to verify. Not so in a functional language.
Debugging
If a functional program doesn't behave the way you expect it to, debugging it is a breeze. You will always be able to reproduce your problem because a bug in a functional program doesn't depend on unrelated code paths that were executed before it. In an imperative program a bug resurfaces only some of the time. Because functions depend on external state produced by side effects from other functions you may have to go through a series of steps in no way related to the bug. In a functional program this isn't the case - if a return value of a function is wrong, it is always wrong, regardless of what code you execute before running the function.
Once you reproduce the problem, getting to the bottom of it is trivial. It is almost pleasant. You break the execution of your program and examine the stack. Every argument in every function call in the stack is available for your inspection, just like in an imperative program. Except in an imperative program that's not enough because functions depend on member variables, global variables, and the state of other classes (which in turn depend on these very same things). A function in a functional program depends only on its arguments, and that information is right before your eyes! Furthermore, in an imperative program examining a return value of a function will not give you a good idea of whether the function behaves properly. You need to hunt down dozens of objects outside its scope to verify that it performed correct actions. In a functional program all you have to do is look at the return value!
Walking through the stack you look at arguments passed to functions and their return values. The minute a return value doesn't make sense you step into the offending function and walk through it. You repeat this recursively until the process leads you to the source of the bug!
Concurrency
A functional program is ready for concurrency without any further modifications. You never have to worry about deadlocks and race conditions because you don't need to use locks! No piece of data in a functional program is modified twice by the same thread, let alone by two different threads. That means you can easily add threads without ever giving conventional problems that plague concurrency applications a second thought!
If this is the case, why doesn't anybody use functional programs for highly concurrent applications? Well, it turns out that they do. Ericsson designed a functional language calledErlang for use in its highly tolerant and scalable telecommunication switches. Many others recognized the benefits provided by Erlang and startedusing it. We're talking about telecommunication and traffic control systems that are far more scalable and reliable than typical systems designed on Wall Street. Actually, Erlang systems are not scalable and reliable. Java systems are. Erlang systems are simply rock solid.
The concurrency story doesn't stop here. If your application is inherently single threaded the compiler can still optimize functional programs to run on multiple CPUs. Take a look at the following code fragment:
String s1 = somewhatLongOperation1();String s2 = somewhatLongOperation2();String s3 = concatenate(s1, s2);
In a functional language the compiler could analyze the code, classify the functions that create strings s1 and s2 as potentially time consuming operations, and run them concurrently. This is impossible to do in an imperative language because each function may modify state outside of its scope and the function following it may depend on it. In functional languages automatic analysis of functions and finding good candidates for concurrent execution is as trivial as automatic inlining! In this sense functional style programs are "future proof" (as much as I hate buzzwords, I'll indulge this time). Hardware manufacturers can no longer make CPUs run any faster. Instead they increase the number of cores and attribute quadruple speed increases to concurrency. Of course they conveniently forget to mention that we get our money's worth only on software that deals with parallelizable problems. This is a very small fraction of imperative software but 100% of functional software because functional programs are all parallelizable out of the box.
Hot Code Deployment
In the old days of Windows in order to install updates it was necessary to restart the computer. Many times. After installing a newer version of a media players. With Windows XP the situation has improved significantly, yet it still isn't ideal (I ran Windows Update at work today and now an annoying system tray icon won't go away until I restart). Unix systems have had a better model for a while. In order to install an update you only need to stop relevant components, not the whole OS. While it is a better situation, for a large class of server applications it still isn't acceptable. Telecommunication systems need to be up 100% of the time because if dialing emergency is not available due to upgrades, lives may be lost. There is no reason Wall Street firms need to bring down their systems to install software updates over the weekend.
An ideal situation is updating relevant parts of the code without stopping any part of the system at all. In an imperative world this isn't possible. Consider unloading a Java class at runtime and reloading a new definition. If we were to do that every instance of a class would become unusable because the state it holds would be lost. We would need to resort to writing tricky version control code. We'd need to serialize all running instances of the class, destroy them, create instances of the new class, try to load serialized data into them hoping the loading code properly migrates the data to work with the new instance. On top of that, every time we change something we'd have to write our migration code manually. And our migration code would have to take special care not to break relationships between objects. Nice in theory, but would never work well in practice.
In a functional program all state is stored on the stack in the arguments passed to functions. This makes hot deployment significantly easier! In fact, all we'd really have to do is run a diff between the code in production and the new version, and deploy the new code. The rest could be done by language tools automatically! If you think this is science fiction, think again. Erlang engineers have beenupgrading live systems without stopping them for years.
Machine Assisted Proofs and Optimizations
An interesting property of functional languages is that they can be reasoned about mathematically. Since a functional language is simply an implementation of a formal system, all mathematical operations that could be done on paper still apply to the programs written in that language. The compiler could, for example, convert pieces of code into equivalent but more efficient pieces with a mathematical proof that two pieces of code are equivalent7. Relational databases have been performing these optimizations for years. There is no reason the same techniques can't apply to regular software.
Additionally, you can use these techniques to prove that parts of your program are correct. It is even possible to create tools that analyze code and generate edge cases for unit tests automatically! This functionality is invaluable for rock solid systems. If you are designing pace makers and air traffic control systems such tools are almost always a requirement. If you are writing an application outside of truly mission critical industries, these tools can give you a tremendous edge over your competitors.
I remember learning about the benefits I outlined above and thinking "that's all very nice but it's useless if I have to program in a crippled language where everything is final." This was a misconception. Making all variables final is crippled in a context of an imperative language like Java but it isn't in a context of functional languages. Functional languages offer a different kind of abstraction tools that make you forget you've ever liked modifying variables. One such tool is capability to work with higher order functions.
A function in such languages is different from a function in Java or C. It is a superset - it can do all the things a Java function can do, and more. We create a function in the same manner we do in C:
int add(int i, int j) {return i + j;}
This means something different from equivalent C code. Let's extend our Java compiler to support this notation. When we type something like this our compiler will convert it to the following Java code (don't forget, everything is final):
class add_function_t {int add(int i, int j) {return i + j;}}add_function_t add = new add_function_t();
The symbol add isn't really a function. It is a small class with one function as its member. We can now pass add around in our code as an argument to other functions. We can assign it to another symbol. We can create instances of add_function_t at runtime and they will be garbage collected when we no longer need them. This makes functions first class objects no different from integers or strings. Functions that operate on other functions (accept them as arguments) are called higher order functions. Don't let this term intimidate you, it's no different from Java classes that operate on each other (we can pass class instances to other classes). We can call them "higher order classes" but nobody cares to because there is no strong academic community behind Java.
How, and when, do you use higher order functions? Well, I'm glad you asked. You write your program as a big monolithic blob of code without worrying about class hierarchies. When you see that a particular piece of code is repeated, you break it out into a function (fortunately they still teach this in schools). If you see that a piece of logic within your function needs to behave differently in different situations, you break it out into a higher order function. Confused? Here's a real life example from my work.
Suppose we have a piece of Java code that receives a message, transforms it in various ways, and forwards it to another server.
class MessageHandler {void handleMessage(Message msg) {// ...msg.setClientCode("ABCD_123");// ...sendMessage(msg);}// ...}
Now imagine that our system has changed and we now route messages to two servers instead of one. Everything is handled in exactly the same way except the client code - the second server wants it in a different format. How do we handle this situation? We could check where the message is headed and format the client code differently, like this:
class MessageHandler {void handleMessage(Message msg) {// ...if(msg.getDestination().equals("server1") {msg.setClientCode("ABCD_123");} else {msg.setClientCode("123_ABC");}// ...sendMessage(msg);}// ...}
This approach, however, isn't scalable. If more servers are added our function will grow linearly and we'll have a nightmare updating it. An object oriented approach is to make MessageHandler a base class and specialize the client code operation in derived classes:
abstract class MessageHandler {void handleMessage(Message msg) {// ...msg.setClientCode(getClientCode());// ...sendMessage(msg);}abstract String getClientCode();// ...}class MessageHandlerOne extends MessageHandler {String getClientCode() {return "ABCD_123";}}class MessageHandlerTwo extends MessageHandler {String getClientCode() {return "123_ABCD";}}
We can now instantiate an appropriate class for each server. Adding servers becomes much more maintainable. That's a lot of code for such a simple modification though. We have to create two new types just to support different client codes! Now let's do the same thing in our language that supports higher order functions:
class MessageHandler {void handleMessage(Message msg, Function getClientCode) {// ...Message msg1 = msg.setClientCode(getClientCode());// ...sendMessage(msg1);}// ...}String getClientCodeOne() {return "ABCD_123";}String getClientCodeTwo() {return "123_ABCD";}MessageHandler handler = new MessageHandler();handler.handleMessage(someMsg, getClientCodeOne);
We've created no new types and no class hierarchy. We simply pass appropriate functions as a parameter. We've achieved the same thing as the object oriented counterpart with a number of advantages. We don't restrict ourselves to class hierarchies: we can pass new functions at runtime and change them at any time with a much higher degree of granularity with less code. Effectively the compiler has written object oriented "glue" code for us! In addition we get all the other benefits of FP. Of course the abstractions provided by functional languages don't stop here. Higher order functions are just the beginning.
Most people I've met have read theDesign Patterns book by the Gang of Four. Any self respecting programmer will tell you that the book is language agnostic and the patterns apply to software engineering in general, regardless of which language you use. This is a noble claim. Unfortunately it is far removed from the truth.
Functional languages are extremely expressive. In a functional language one does not need design patterns because the language is likely so high level, you end up programming in concepts that eliminate design patterns all together. Once such pattern is an Adapter pattern (how is it different from Facade again? Sounds like somebody needed to fill more pages to satisfy their contract). It is eliminated once a language supports a technique called currying.
Adapter pattern is best known when applied to the "default" abstraction unit in Java - a class. In functional languages the pattern is applied to functions. The pattern takes an interface and transforms it to another interface someone else expects. Here's an example of an adapter pattern:
int pow(int i, int j);int square(int i){return pow(i, 2);}
The code above adapts an interface of a function that raises an integer to an integer power to an interface of a function that squares an integer. In academic circles this trivial technique is called currying (after a logician Haskell Curry who performed mathematical acrobatics necessary to formalize it). Because in FP functions (as opposed to classes) are passed around as arguments, currying is used very often to adapt functions to an interface that someone else expects. Since the interface to functions is its arguments, currying is used to reduce the number of arguments (like in the example above).
Functional languages come with this technique built in. You don't need to manually create a function that wraps the original, functional languages will do that for you. As usual, let's extend our language to support this technique.
square = int pow(int i, 2);
This will automatically create a function square for us with one argument. It will call pow function with the second argument set to 2. This will get compiled to the following Java code:
class square_function_t {int square(int i) {return pow(i, 2);}}square_function_t square = new square_function_t();
As you can see, we've simply created a wrapper for the original function. In FP currying is just that - a shortcut to quickly and easily create wrappers. You concentrate on your task, and the compiler writes the appropriate code for you! When do you use currying? This should be easy. Any time you'd like to use an adapter pattern (a wrapper).
Lazy (or delayed) evaluation is an interesting technique that becomes possible once we adopt a functional philosophy. We've already seen the following piece of code when we were talking about concurrency:
String s1 = somewhatLongOperation1();String s2 = somewhatLongOperation2();String s3 = concatenate(s1, s2);
In an imperative language the order of evaluation would be clear. Because each function may affect or depend on an external state it would be necessary to execute them in order: first somewhatLongOperation1, then somewhatLongOperation2, followed by concatenate. Not so in functional languages.
As we saw earlier somewhatLongOperation1 and somewhatLongOperation2 can be executed concurrently because we're guaranteed no function affects or depends on global state. But what if we don't want to run the two concurrently, do we need to run them in order? The answer is no. We only need to run these operations when another function depends on s1 and s2. We don't even have to run them before concatenate is called - we can delay their evaluation until they're required within concatenate. If we replace concatenate with a function that has a conditional and uses only one of its two parameters we may never evaluate one of the parameters at all!Haskell is an example of a delayed evaluation language. In Haskell you are not guaranteed that anything will be executed in order (or at all) because Haskell only executes code when it's required.
Lazy evaluation has numerous advantages as well as disadvantages. We will discuss the advantages here and will explain how to counter the disadvantages in the next section.
Optimization
Lazy evaluation provides a tremendous potential for optimizations. A lazy compiler thinks of functional code exactly as mathematicians think of an algebra expression - it can cancel things out and completely prevent execution, rearrange pieces of code for higher efficiency, even arrange code in a way that reduces errors, all guaranteeing optimizations won't break the code. This is the biggest benefit of representing programs strictly using formal primitives - code adheres to mathematical laws and can be reasoned about mathematically.
Abstracting Control Structures
Lazy evaluation provides a higher order of abstraction that allows implementing things in a way that would otherwise be impossible. For example consider implementing the following control structure:
unless(stock.isEuropean()) {sendToSEC(stock);}
We want sendToSEC executed unless the stock is European. How can we implement unless? Without lazy evaluation we'd need some form of a macro system, but in a language like Haskell that's unnecessary. We can implement unless as a function!
void unless(boolean condition, List code) {if(!condition)code;}Note that code is never evaluated if the condition is true. We cannot reproduce this behavior in a strict language because the arguments would be evaluated before unless is entered.Infinite Data Structures
Lazy languages allow for definition of infinite data structures, something that's much more complicated in a strict language. For example, consider a list with Fibonacci numbers. We clearly can't compute and infinite list in a reasonable amount of time or store it in memory. In strict languages like Java we simply define a Fibonacci function that returns a particular member from the sequence. In a language like Haskell we can abstract it further and simply define an infinite list of Fibonacci numbers. Because the language is lazy, only the necessary parts of the list that are actually used by the program are ever evaluated. This allows for abstracting a lot of problems and looking at them from a higher level (for example, we can use list processing functions on an infinite list).
Disadvantages
Of course there ain't no such thing as a free lunch(tm). Lazy evaluation comes with a number of disadvantages. Mainly that it is, well, lazy. Many real world problems require strict evaluation. For example consider the following:
System.out.println("Please enter your name: ");System.in.readLine();
In a lazy language you have no guarantee that the first line will be executed before the second! This means we can't do IO, can't use native functions in any meaningful way (because they need to be called in order since they depend on side effects), and can't interact with the outside world! If we were to introduce primitives that allow ordered code execution we'd lose the benefits of reasoning about our code mathematically (which would take all of the benefits of functional programming with it). Fortunately not all is lost. Mathematicians got to work and developed a number of tricks to ensure code gets executed in particular order in a functional setting. We get the best of both worlds! These techniques include continuations, monads, and uniqueness typing. In this article we'll only deal with continuations. We'll leave monads and uniqueness typing for another time. Interestingly, continuations are useful for many things other than enforcing a particular order of evaluation. We'll talk about that as well.
Continuations to programming are what Da Vinci Code is to human history: an amazing revelation of the greatest cover-up known to man. Well, may be not, but they're certainly revealing of deceit in the same sense as square roots of negative numbers.
When we learned about functions we only learned half truths based on a faulty assumption that functions must return their value to the original caller. In this sense continuations are a generalization of functions. A function must not necessarily return to its caller and may return to any part of the program. A "continuation" is a parameter we may choose to pass to our function that specifies where the function should return. The description may be more complicated than it sounds. Take a look at the following code:
int i = add(5, 10);int j = square(i);
The function add returns 15 to be assigned to i, the place where add was originally called. After that the value of i is used to call square. Note that a lazy compiler can't rearrange these lines of code because the second line depends on successful evaluation of the first. We can rewrite this code block using Continuation Passing Style or CPS, where the function add doesn't return to the original caller but instead returns its result to square.
int j = add(5, 10, square);
In this case add gets another parameter - a function that add must call with its result upon completion. In this case square is a continuation of add. In both cases j will equal 225.
Here lays the first trick to force a lazy language to evaluate two expressions in order. Consider the following (familiar) IO code:
System.out.println("Please enter your name: ");System.in.readLine();
The two lines don't depend on each other and the compiler is free to rearrange them as it wishes. However, if we rewrite this code in CPS, there will be a dependency and the compiler will be forced to evaluate the two lines in order!
System.out.println("Please enter your name: ", System.in.readLine);
In this case println needs to call readLine with its result and return the result of readLine. This allows us to ensure that the two lines are executed in order and that readLine is evaluated at all (because the whole computation expects the last value as a result). In case of Java println returns void but if it were to return an abstract value (that readLine would accept), we'd solve our problem! Of course chaining function calls like that will quickly become unreadable, but it isn't necessary. We could add syntactic sugar to the language that will allow us to simply type expressions in order, and the compiler would chain the calls for us automatically. We can now evaluate expressions in any order we wish without losing any of the benefits of FP (including the ability to reason about our programs mathematically)! If this is still confusing, remember that a function is just an instance of a class with one member. Rewrite above two lines so that println and readLine are instances of classes and everything will become clear.
I would now wrap up this section, except that we've only scratched the surface of continuations and their usefulness. We can write entire programs in CPS, where every function takes an extra continuation argument and passes the result to it. We can also convert any program to CPS simply by treating functions as special cases of continuations (functions that always return to their caller). This conversion is trivial to do automatically (in fact, many compilers do just that).
Once we convert a program to CPS it becomes clear that every instruction has some continuation, a function it will call with the result, which in a regular program would be a place it must return to. Let's pick any instruction from above code, say add(5, 10). In a program written in CPS style it's clear what add's continuation is - it's a function that add calls once it's done. But what is it in a non-CPS program? We could, of course, convert the program to CPS, but do we have to?
It turns out that we don't. Look carefully at our CPS conversion. If you try to write a compiler for it and think about it long enough you'll realize that the CPS version needs no stack! No function ever "returns" in the traditional sense, it just calls another function with the result instead. We don't need to push function arguments on the stack with every call and then pop them back, we can simply store them in some block of memory and use a jump instruction instead. We'll never need the original arguments - they'll never be used again since no function ever returns!
So, programs written in CPS style have no stack but have an extra argument with a function to call. Programs not written in CPS style have no argument with a function to call, but have the stack instead. What does the stack contain? Simply the arguments, and a pointer to memory where the function should return. Do you see a light bulb? The stack simply contains continuation information! The pointer to the return instruction in the stack is essentially the same thing as the function to call in CPS programs! If you wanted to find out what continuation for add(5, 10) is, you'd simply have to examine the stack at the point of its execution!
So that was easy. A continuation and a pointer to the return instruction in the stack are really the same thing, only a continuation is passed explicitly, so that it doesn't need to be the same place where the function was called from. If you remember that a continuation is a function, and a function in our language is compiled to an instance of a class, you'll realize that a pointer to the return instruction in the stack and the continuation argument are really the same thing, since our function (just like an instance of a class) is simply a pointer. This means that at any given point in time in your program you can ask for a current continuation (which is simply the information on the stack).
Ok, so we know what a current continuation is. What does it mean? When we get a current continuation and store it somewhere, we end up storing the current state of our program - freezing it in time. This is similar to an OS putting itself into hibernation. A continuation object contains the information necessary to restart the program from the point where the continuation object was acquired. An operating system does this to your program all the time when it context switches between the threads. The only difference is that it keeps all the control. If you ask for a continuation object (in Scheme this is done by calling call-with-current-continuation function) you'll get an object that contains the current continuation - the stack (or in a CPS case the function to call next). You can store this object in a variable (or alternatively, on disk). When you choose to "restart" your program with this continuation object you will "transform" to the state of the program when you grabbed the continuation object. It's the same thing as switching back to a suspended thread or waking up an OS from hibernation, except you can do it again and again. When an OS wakes up, the hibernation information is destroyed. If it wasn't, you'd be able to wake up from the same point over and over again, almost like going back in time. You have that control with continuations!
In what situations are continuations useful? Usually when you're trying to simulate state in an application of inherently stateless nature to ease your life. A great application of continuations areweb applications. Microsoft's ASP.NET goes to tremendous lengths to try and simulate state so that you can write your application with less hassle. If C# supported continuations half of ASP.NET's complexity would disappear - you'd simply store a continuation and restart it when a user makes the web request again. To a programmer of the web application there would be no interruption - the program would simply start from the next line! Continuations are an incredibly useful abstraction for some problems. Considering that many of the traditional fat clients are moving to the web, continuations will become more and more important in the future.
Pattern matching is not a new or innovative feature. In fact, it has little to do with functional programming. The only reason why it's usually attributed to FP is that functional languages have had pattern matching for some time, while modern imperative languages still don't.
Let's dive into pattern matching with an example. Here's a Fibonacci function in Java:
int fib(int n) {if(n == 0) return 1;if(n == 1) return 1;return fib(n - 2) + fib(n - 1);}
And here's an example of a Fibonacci function in our Java-derived language that supports pattern matching:
int fib(0) {return 1;}int fib(1) {return 1;}int fib(int n) {return fib(n - 2) + fib(n - 1);}
What's the difference? The compiler implements branching for us.
What's the big deal? There isn't any. Someone noticed that a large number of functions contain very complicated switch statements (this is particularly true about functional programs) and decided that it's a good idea to abstract that away. We split the function definition into multiple ones, and put patterns in place of some arguments (sort of like overloading). When the function is called, the compiler compares the arguments with the definitions at runtime, and picks the correct one. This is usually done by picking the most specific definition available. For example, int fib(int n) can be called with n equal to 1, but it isn't because int fib(1) is more specific.
Pattern matching is usually more complex than our example reveals. For example, an advanced pattern matching system will allow us to do the following:
int f(int n < 10) { ... }int f(int n) { ... }
When is pattern matching useful? In a surprisingly large number of cases! Every time you have a complex structure of nested ifs, pattern matching can do a better job with less code on your part. A good function that comes to mind is a standard WndProc function that all Win32 applications must provide (even though it's often abstracted away). Usually a pattern matching system can examine collections as well as simple values. For example, if you pass an array to your function you could pick out all arrays in which the first element is equal to 1 and the third element is greater than 3.
Another benefit of pattern matching is that if you need to add or modify conditions, you don't have to go into one huge function. You simply add (or modify) appropriate definitions. This eliminates the need for a whole range of design patterns from the GoF book. The more complex your conditions are, the more pattern matching will help you. Once you're used to it, you start wondering how you ever got through your day without it.
So far we've discussed features in the context of "pure" functional languages - languages that are implementations of lambda calculus and don't include features that conflict with Church's formalism. However, many of the features of functional languages are useful outside of lambda calculus framework. While an implementation of an axiomatic system is useful because it allows thinking about programs in terms of mathematical expressions, it may or may not always be practical. Many languages choose to incorporate functional elements without strictly adhering to functional doctrine. Many such languages (like Common Lisp) don't require variables to be final - you can modify things in place. They also don't require functions to depend only on their arguments - functions are allowed to access state outside of their boundaries. But they do include functional features - like higher order functions. Passing functions around in impure languages is a little bit different than doing it in the confines of lambda calculus and requires support for an interesting feature often referred to as lexical closure. Let's take a look at some sample code. Remember, in this case variables aren't final and functions can refer to variables outside of their scope:
Function makePowerFn(int power) {int powerFn(int base) {return pow(base, power);}return powerFn;}Function square = makePowerFn(2);square(3); // returns 9
The function make-power-fn returns a function that takes a single argument and raises it to a certain power. What happens when we try to evaluate square(3)? The variable power isn't anywhere in scope of powerFn because makePowerFn has returned and its stack is long gone. How can square work, then? The language must, somehow, store the value of power somewhere for square to work. What if we create another function, cube, that raises something to the third power? The runtime must now store two copies of power, one for each function we generated using make-power-fn. The phenomenon of storing these values is called a closure. Closures don't only store arguments of a host function. For example, a closure can look like this:
Function makeIncrementer() {int n = 0;int increment() {return ++n;}}Function inc1 = makeIncrementer();Function inc2 = makeIncrementer();inc1(); // returns 1;inc1(); // returns 2;inc1(); // returns 3;inc2(); // returns 1;inc2(); // returns 2;inc2(); // returns 3;
The runtime manages to store n, so incrementers can access it. Furthermore, it stores various copies, one for each incrementer, even though they're supposed to disappear when makeIncrementer returns. What does this code compile to? How do closures work behind the scenes? Fortunately, we have a back stage pass.
A little common sense goes a long way. The first observation is that local variables are no longer limited to simple scope rules and have an undefined lifetime. The obvious conclusion is that they're no longer stored on the stack - they must be stored on the heap instead8. A closure, then, is implemented just like a function we discussed earlier, except that it has an additional reference to the surrounding variables:
class some_function_t {SymbolTable parentScope;// ...}
When a closure references a variable that's not in its local scope, it consults this reference for a parent scope. That's it! Closures bring functional and OO worlds closer together. Every time you create a class that holds some state and pass it to somewhere else, think of closures. A closure is just an object that creates "member variables" on the fly by grabbing them from the scope, so you don't have to!
This article only scratches the surface of functional programming. Sometimes a small scratch can progress to something bigger and in our case it's a good thing. In the future I plan to write about category theory, monads, functional data structures, type systems in functional languages, functional concurrency, functional databases and much more. If I get to write (and in the process learn) about half of these topics my life will be complete. In the meantime,Google is our friend.
If you have any questions, comments, or suggestions, please drop a note atcoffeemug@gmail.com. I'll be glad to hear your feedback.
When I was looking for a job in the fall of 2005 I often did ask this question. It's quite amusing how many blank stares I got. You would think that at about $300,000 a piece these people would at least have a good understanding of most tools available to them.
This appears to be a controversial question. Physicists and mathematicians are forced to acknowledge that it isn't at all clear whether everything in the universe obeys the laws that can be described by mathematics.
I've always hated history lessons that offer a dry chronology of dates, names, and events. To me history is about the lives of people who changed the world. It is about their private reasons behind their actions, and the mechanisms by which they affected millions of souls. For this reason this history section is hopelessly incomplete. Only very relevant people and events are discussed.
When I was learning about functional programming I was very annoyed by the term "lambda" because I couldn't really understand what it really means. In this context lambda is a function, the single Greek letter was just easier to write in a mathematical notation. Every time you hear "lambda" when talking about functional programming just translate it in your mind to "function".
Interestingly Java strings are immutable anyway. It's rather interesting to explore the reasons for this treachery, but that would distract us from our goal.
Most functional language compilers optimize recursive functions by transforming them to their iterative alternatives whenever possible. This is called atail call optimization.
The opposite isn't always true. While it is sometimes possible to prove that two pieces of code are equivalent, it isn't possible in all situations.
This is actually no slower than storing on the stack because once you introduce a garbage collector, memory allocation becomes an O(1) operation.