A library for evaluating Excel-style formulas (转与 Code Project)

来源:百度文库 编辑:神马文学网 时间:2024/06/13 06:53:03
A library for evaluating Excel-style formulas
ByEugene Ciloci.
Allows you to evaluate Excel-style formulas, track their dependencies, and recalculate in natural order
VB.NET (VB 8.0)
Windows (Win2K, WinXP), .NET (.NET 2.0)
Win32, VS (VS2005), WinForms
Dev
Posted: 4 Mar 2007
Views: 4,366
Note:This is an unedited reader contribution

Search   Articles Authors   Advanced Search
Sitemap |Add to IE Search
Download sample application - 228.2 KBDownload library and documentation - 376.8 KBDownload source code - 318.2 KBDownload latest version from SourceForge
Introduction
FormulaEngine is a .NET assembly that enables you to add formula support to yourapplication. It takes care of parsing and evaluating formulas, tracking their dependencies, and recalculating in natural order. The formula syntax and much of the engine‘s functionality are direct replicas of Excel ensuring a low learning curve for users. The library is licensed under the LGPL and the project is hostedhere on SourceForge. Features
Here‘s a list of the engine‘s major features: Parses and evaluates Excel-style formulas Comes with over a hundred common Excel functions already implemented and makes it easy to add your own Custom formula functions are automatically validated for argument type and count Supports custom functions with optional arguments and variable-length argument lists Named references, volatile functions, anddynamic references are supported Supports formulas on multiple sheets Interacts with sheets through an interface, allowing any class to be used as a sheet Tracks and adjusts references during row/column inserts/deletes and range moves Manages formula dependencies and recalculates in natural order Supports working without any sheets Culture-sensitive decimal point and argument separator
Motivation
I wrote this library for the following reasons: I needed a hobby project to work on at home and this seemed like an idea with the right balance of challenge and usefulness At work, I was working with a formula engine implemented by a third-party component vendor for their grid. I though that their implementation was absolutely horrible (all operands are strings) and that any programmer worth his salt should be able to do better; I decided to put my money where my mouth is.
Yet another expression evaluator?
Seeing that expression evaluators are very popular here on CodeProject, what makes this one different? The two main differences are that this library implements many features found in Excel and that it does more than just evaluate expressions. Overview
In this article, I will give a brief overview of the three major things that this library enables you to do: Formula parsing/evaluation Natural order recalculation Defining custom functions for use in formulas
Formula parsing and evaluation
The first thing this library allows you to do is evaluate formula expressions. The supported syntax is based on Excel and 95% of existing formulas should be able to be used without any modification. The engine provides the Formula class, which represents a compiled formula expression. You call the engine‘s CreateFormula method with an expression and it will return a Formula instance that you can evaluate: ‘ Create an instance of the engine Dim engine As New FormulaEngine ‘ Create a formula Dim f As Formula = engine.CreateFormula("=sqrt(3^2 + 4^2)") ‘ Evaluate the formula to get a result (5.0) Dim result As Double = f.Evaluate() The method will throw an InvalidFormulaException exception if it cannot create a formula from the expression. This is usually (but is not limited to) due to a syntax error in the expression. The inner exception of the thrown exception will have more details. The engine also has the ever-popular Evaluate method for when you quickly want to evaluate an expression. Let‘s try to evaluate the "mega" formula foundhere:
‘ Create an instance of the engine Dim engine As New FormulaEngine ‘ Assume cell A1 contains "http://j-walk.com/ss/books" ‘ Call Evaluate to get a result: "books" Dim result As String = engine.Evaluate("=RIGHT(A1,LEN(A1)-FIND(CHAR(1),SUBSTITUTE(A1,""/"",CHAR(1)" _ & ",LEN(A1)-LEN(SUBSTITUTE(A1,""/"","""")))))") Data types and calculation
The engine supports the followingdata types when evaluating expressions: Integer, Double, String, Boolean, DateTime, Null, Error values, and References. Just like Excel, operands are loosely typed meaning that any data type is valid as long as it can be converted to the desired data type. For example: the expression ="123" + 10 is valid since the string "123" can be converted to a number. One major difference from Excel is that DateTime values are not treated as numbers. If you want to add/subtract dates, you will have to use a function. When an error is encountered during formula evaluation, an ErrorValueWrapper instance will be returned. This class wraps one of the seven Excel error values and allows you to get the specific error as well as format it.
Formulas and the result type property
The formula class has a property on it called ResultType that allows you to specify the desired type of the formula‘s result. This is useful when you have an expression like =A1 which could validly evaluate to either the contents of cell A1 or a reference to it. By setting the result type you can control which of the two results you get. The formula will attempt to convert its result to the specified type. If the conversion is not possible, then the #VALUE! error will be returned. Dim f As Formula = engine.CreateFormula("=A1") ‘ Make the formula evaluate to any value except a reference f.ResultType = OperandType.Primitive ‘ result will be the contents of cell A1 Dim result As Object = f.Evaluate() ‘ Make the formula evaluate to a sheet reference f.ResultType = OperandType.SheetReference ‘ result will be a reference to cell A1 result = f.Evaluate() Natural order recalculation
The second thing the library allows you to do is natural order recalculation. For those unfamiliar with the term, recalculating in natural order means that a formula is recalculated after any formulas that it depends on. Consider the a worksheet with the following values and formulas: (A1): 15
(B2): =A1 + 10
(C1): =A1 + B2
(D2): =C1 * 2
When the contents of cell A1 change, the three formulas need to be recalculated. The formula at B2 must be recalculated first since it only depends on A1. The formula at C1 is recalculated second since it depends on the value of B2. Finally, the formula at D2 is recalculated last since it depends on C1. For the engine to be able to recalculate in natural order, it must keep track of the dependencies between formulas. It does this by acting as a container for formulas. As formulas are added to the engine, their dependencies are analyzed and a dependency graph is built. You then tell the engine to recalculate and it will use the graph to build up a calculation list, sort it in natural order, and recalculate each formula.
References
The basic unit that the engine uses to track dependencies is the reference. There are various types of references, they all implement the IReference interface, and the ReferenceFactory class creates them all. When you add a formula to the engine, you need to specify a reference that the formula will be bound to. The formula will then "live" at that reference. By changing the type of reference you bind the formula to, you can change how that formula is referenced from other formulas. For example: by binding a formula to a named reference, you allow other formulas to reference it by using a name. ‘ Associate the name Root2 with a formula engine.AddFormula("=sqrt(2)", engine.ReferenceFactory.Named("Root2")) ‘ Use the name in an expression (result is 2.0) Dim result As Double = engine.Evaluate("=root2 ^ 2") Now that we‘ve seen how the engine tracks dependencies, let‘s see how the initial example above would be set up using code: ‘ Assume we‘ve already added a worksheet to the engine ‘ Add a formula at B2 engine.AddFormula("=A1 + 10", engine.ReferenceFactory.Cell(2, 2)) ‘ Add a formula at C1 engine.AddFormula("=A1 + B2", engine.ReferenceFactory.Parse("C1")) ‘ Add a formula at D2 engine.AddFormula("=C1 * 2", engine.ReferenceFactory.Parse("D2")) Our engine now contains 3 formulas and a graph describing their dependencies. All we need to do is tell the engine that a reference has changed and that all its dependents need to be recalculated. We do this using the Recalculate method: ‘ Create a reference to cell A1 Dim a1Ref As ISheetReference = engine.ReferenceFactory.Parse("A1") ‘ Recalculate all dependents of A1 engine.Recalculate(a1Ref) Custom functions
The last thing the engine allows you to do is define your own functions for use in formulas. To do this we must use the FunctionLibrary class, which is accessible through a property on the engine. The extensibility mechanism I used is based on delegates. I felt that this makes it easier to add many functions because you don‘t have to define a new class for each function as with the alternative interface/subclass based mechanism. It also allows the engine to use reflection to add all the methods of a class in bulk. Defining a custom function requires three steps: Define a method with the same signature as the FormulaFunctionCall delegate Tag the method with either the FixedArgumentFormulaFunction or VariableArgumentFormulaFunction attribute Add it to the function library
Let‘s define a function that returns the length of the hypotenuse given the length of the other two sides: First we must define a method with the correct signature: Public Sub Hypotenuse(ByVal args() As Argument, ByVal result As FunctionResult, ByVal engine As FormulaEngine) End Sub Explanation of the three arguments: The arguments to our function as an array of Argument instances An instance of the FunctionResult class where we will store our function‘s return value An instance of the formula engine
Second, we have to adorn our method with the proper attribute so that the function library can recognize it: _ Public Sub Hypotenuse(ByVal args() As Argument, ByVal result As FunctionResult, ByVal engine As FormulaEngine) End Sub We have now declared our method as requiring 2 arguments, both of type Double. The engine will only call our method if exactly 2 arguments were specified in the formula and both arguments can be converted to Double. This frees us from having to write argument validation code for every function we want to implement. Finally, we must write the actual implementation of our function:
_ Public Sub Hypotenuse(ByVal args() As Argument, ByVal result As FunctionResult, ByVal engine As FormulaEngine) ‘ Get the value of the first argument as a double Dim a As Double = args(0).ValueAsDouble ‘ Get the value of the second argument as a double Dim b As Double = args(1).ValueAsDouble ‘ Compute the hypotenuse Dim hyp As Double = System.Math.Sqrt(a ^ 2 + b ^ 2) ‘ Set the function‘s result result.SetValue(hyp) End Sub We get the value of each of our arguments as a double, compute the hypotenuse, and set the value into the FunctionResult. Next we add our custom function to the library:
engine.FunctionLibrary.AddFunction(AddressOf Hypotenuse) And now we can use it in a function: dim result as Double = engine.Evaluate("=10 + Hypotenuse(3, 4)") Please note that all functions must return a value and you cannot define/undefine functions while formulas are defined. Demo application
The demo application is a poor man‘s version of Excel. It is meant to be a reference implementation showing how to use all of the engine‘s features. It shows off the following: Multiple sheets and cross-sheet references Named references Reference tracking through row/column insert/delete and range move Cut/Copy/Paste and fill right/down support Absolute/relative references Save and load of the formula engine and worksheets Formulas that aren‘t on any sheet and that "watch" sheet values (go to Insert -> Chart)
Implementation details
Formula Parsing
To implement parsing of formulas, I used the excellentGrammatica parser generator. I wrote a grammar describing the syntax of a formula and let Grammatica generate a parser. I then let the parser parse, listen to callbacks, and fill out the parse tree with my own objects. At the end, I have a root element representing the entire parse tree of the formula. I re-arrange the tree into postfix form and save that into a formula instance. Evaluating a formula then simply consists of iterating through each element and having it push/pop values off a stack. In the end, there should be one value left on the stack, which is the formula‘s result. I chose Grammatica because it has a clean separation between grammar and parser code, has easy to use grammar syntax, and it outputs VB .NET code. The project hasn‘t had any activity in a while but it is not dead and even though the version I‘m using is an alpha, I found it to be very stable: no crashes and no incorrect functionality. I highly recommend it if you, like me, are new to parsers and grammars.
Also, since the grammatica parser is created at runtime, it is very easy to dynamically change the decimal and argument separator patterns to use the values of the current culture. This means that instead of =1.2 + sum(1,2,3), a user in Estonia can enter =1,2 + sum(1;2;3) and have it be a valid formula.
Natural order recalculation
The engine maintains a dependency graph for all formulas. When a recalculate is required, a temporary graph is built starting at the root node of the recalculate. Once all circular references are removed, atopological sort is performed on the graph to get a list of formulas in natural order. With this calculation list, it is simply a matter of iterating through it and re-evaluating each formula. Not Implemented
The following things are not implemented because they are obscure or advanced features that most people won‘t know about or find useful: Array formulas Range, union and intersection operators 3D references
I will certainly look into implementing them if there is enough demand. Conclusion
I found that coding this project gave me lots of insight into how Excel works. Having to implement 100+ Excel functions makes you very familiar with all their little quirks. For example: the formula =Offset(A1,1,1) entered into cell A1 is not treated as a circular reference by Excel even though it depends on its own cell. Another example is that the concatenate formula does not work with non-cell ranges. Whereas you can say =Sum(A1:B2), you cannot say =Concatenate(A1:B2). This project is currently in the alpha phase because it hasn‘t had any real-world testing/usage. As mentioned in the introduction, the project is hosted on SourceForge. Any bugs or feature requests should be reported there using the provided tools. Any new releases will be also posted there.
Well, I hope you guys find this project useful!
History
Mar 4, 2007 Initial Release
Mar 6, 2007 Switched from Dundas chart to ZedGraph due to its smaller footprint Source release now includes chart assembly
Eugene Ciloci
Clickhere to view Eugene Ciloci‘s online profile.
Other popular .NET articles:
The 30 Minute Regex Tutorial Learn how to use regular expressions in 30 minutes with Expresso.
Strings UNDOCUMENTED Detailed looked at the implementation of strings in .NET
Understanding Classic COM Interoperability With .NET Applications Discusses how existing COM components can be used from managed code.
Expresso - A Tool for Building and Testing Regular Expressions For learning, building, and debugging .NET Framework regular expressions

[Top]Sign in to vote for this article:     PoorExcellent


Note: You mustSign in to post to this message board.
FAQ  Message score threshold 1.0 2.0 3.0 4.0 5.0    Search comments
View Normal (slow) Preview (slow) Message View Topic View Thread View Expanded (Supporters only)    Per page 10 25 50 (must logon)
  Msgs 1 to 10 of 10 (Total: 10) (Refresh) First Prev Next
Subject  Author  Date

 Localized Excel  gajatko  14hrs 32mins ago
  Good work, Eugene!
I need a parser which could parse Excel expressions from localized version of Office.
When localizing, Microsoft uses to translate names of functions (IF -> JE?ELI) and operators (OR -> LUB) to the destinate language.
How about it?
/gajatko/
[Sign in |View Thread |Get Link]

 Re: Localized Excel  Eugene Ciloci  8hrs 14mins ago
  Localizing function names should be doable. I just have to think about the following details:
-I‘ll need people to provide me with localized names for various languages
-Once I localize the function names, I‘ll have to localize all other messages in the library as well or things won‘t be consistent.
>I need a parser which could parse Excel expressions from localized version of Office.
If you are going to be parsing arbitrary Excel expressions there are some things you should be aware of:
-I haven‘t implemented all of Excel‘s functions, just the common ones. So if the expression uses a function that‘s not implemented, you will not be able to create a formula from it until the function is implemented.
-I haven‘t implemented advanced Excel constructs like range intersection, union and 3D references. If the formula uses those, then you‘ll get a syntax error on formula creation.
Perhaps if you could give more details about your requirements, I‘ll be able to get a better idea of what‘s involved.
Thanks.
ps: I wasn‘t aware that Excel formulas supported operators like "OR". There is a function called OR but formulas (at least in Office 2000) only support logical operators like ">" and "<>".
[Sign in |View Thread |Get Link]


 Time  navalArya  3:07 8 Mar ‘07
  Great work!
Just wanted to know how much time did it take for making this project.
[Sign in |View Thread |Get Link]

 Re: Time  Eugene Ciloci  19:03 8 Mar ‘07
  I wrote the first code in September and I wrapped up in January.....so about 4 months‘ worth of after-work and weekend effort....
[Sign in |View Thread |Get Link]

 Re: Time  navalArya  19:43 9 Mar ‘07
  woow hats of to u..!
[Sign in |View Thread |Get Link]


 Doesn‘t quite work !  victorbos  15:54 6 Mar ‘07
  A very interesting subject/project. The demo works. But the "Src" download does not. Get approximately 70 error and warning messages -- mostly related to Dundas Charting and NUnit -- when I try to build in VS2005. As I am familiar with neither and not a VB guy, I don‘t know where to begin/end. Any chance you can upload a version of the source project, but without Charting and NUnit, because I am mainly interested in learning what you‘ve done in the "Excel-style library".
[Sign in |View Thread |Get Link]

 Re: Doesn‘t quite work !  victorbos  18:59 6 Mar ‘07
  Figured out how to get around the error messages. (Added a reference to the Dundas Charting DLL and commented out all the NUnit "
[Sign in |View Thread |Get Link]

 Re: Doesn‘t quite work !  Eugene Ciloci  21:59 6 Mar ‘07
  I was starting to regret using the Dundas chart library because of its large size. I updated the demo release to use the much lighter ZedGraph chart component. I also updated the source release to include the ZedGraph assembly.
As for the NUnit dependency, you will have to download and install NUnit. Note that this is only required if you wish to run through my test suite. If you don‘t care about the tests and don‘t have NUnit installed, then just remove the Tests project from the solution and you‘ll be all set.
Source code hint: The DependencyManager class is responsible for sorting the calculation list in natural order. Its inner class, the DependencyMap has an overriden ToString() that you can call to get a string description of the graph.
Have fun,
Eugene.
[Sign in |View Thread |Get Link]

 Re: Doesn‘t quite work !  victorbos  8:30 7 Mar ‘07
  Thanks, Eugene. Although I have yet to plumb the depths of your library, I am very impressed by the enormity of the challenge you‘ve undertaken. For that alone, I have to rate this a 5!
[Sign in |View Thread |Get Link]


 Intersting  Mahmoudzizo  16:34 4 Mar ‘07
  Thank your for this library.
Mahmoud Abdelaziz
master_mahmoud@hotmal.com
http://zprogramming.blogspot.com/
[Sign in |View Thread |Get Link]

Last Visit: 8:08 Wednesday 14th March, 2007 First Prev Next
General comment   News / Info   Question   Answer   Joke / Game   Admin message
Updated: 4 Mar 2007 Article content copyright Eugene Ciloci, 2007