W.B Storytime

Windows symbol visibility

I've been trying to write some library for F# to consume a native C library and ran into a Windows specific issue of symbol visibility. As it turns out the default compiler, MSVC, ends up exporting nothing unless being told explicitly to do so. Making it more confusing is CMake's property WINDOWS_EXPORT_ALL_SYMBOLS which doesn't do much.

In a standard way, one would expect to write something like:

	const int sum(int a, int b){
        return a + b;
    }
    

After banging my head for a while and tweaking my simple build files, thinking there was something wrong with my F# code, I turned my attention at the Windows peculiarities. What is needed for this simple code to be visible for consumption is decorating it with the following:


	__declspec(dllexport) const int sum(int a, int b){
        return a + b;
    }
    

This exports the symbol within the shared library and it is ready for consumption. The "__declspec" does not impact compilation on linux/osx, so no worries about that. It can be wrapped around a prettier macro, but that's up to discretion.

As far as the CMake, WINDOWS_EXPORT_ALL_SYMBOLS, that didn't do much. But to create a consumable dll for windows, ensure the library is explicit in being declared as SHARED.

Why do twice the work?

Let's pretend we have to build an application for a small store where we have track items and then provide a receipt to the customer. In this case, items have a name and price.  A receipt is comprised perhaps of an item, quantity of said item, and a total for the entire receipt. 

In regular terms, this all seems simple and straightforward so lets model it out:

 Items:
  • name
  • price

Receipt:
  • item
  • item_quantity
  • total

Okay, everything looks fine sort of...right? Seems to me like we have an excess field we don't need, have you spotted it yet? If we were writing this in some framework like Django it might look as such:


class Item(models.Model):
    name = models.Charfield()
    price = models.PositiveIntegerField()

class Receipt(models.Model):
    transaction_date = models.DateField()
    total = models.PositiveIntegerField()

class ReceiptItem(models.Model):
    item = models.ForeignKey(Item)
    receipt = models.ForeignKey(Receipt)
    quantity = models.PositiveIntegerField()
    
    

So now we have our pretty model for our data and yet we still have that pesky duplicate of data as a column/field in the Receipt table. This in theory is handled by normalization of the relational tables, BUT, in practice most people don't normalize and if they do it is only at the slightest. 

It is easy to glance over this mistake, because don't all receipts need a total? Sure! Of course they do! But you can get the same data by simply querying how many items and what price are related to the particular receipt rather than hard coding a total value. This way you don't have to worry about two things to do, you can just worry about the one. In case anything changes, I don't have to do:

...
receipt_item.quantity = 3
receipt_item.save()
# update the new total
receipt.total = 1245
receipt.save()

I can just do because a receipt total is directly related to the items/quantity:
receipt_item.quantity = 3
receipt_item.save()

Save yourself the hassle and pain down the road by proper normalization.

Things I am tinkering on

Here is a couple things I've been tinkering on / ideas mulled upon the last couple months:

  • Microvms / unikernels : build a proper one and the majority of security issues immediately go way and performance gets a significant boost.
  • Bitcoin : trying to improve the build system used by adding CMake to the project. What this would mean is newcomers can quickly get up and going with VSCode / CLion and have access to newer tooling thanks to CMake. Interestingly many of the core devs don't appear to use an IDE for C++ work.
  • F#: been trying to see how to integrate some Rust or C libs into F# as this language is a fun one to play with. .NET does have some ahead of time compilations which result in faster bootups and performance, but many things get broken as they rely heavily on reflection at runtime unfortunately. Also finding documentation on .NET for many dynamic linking has been hard and so I've been just exploring and seeing what /how to communicate at the edges with strings and whatnot.

A simple intro into tail call recursion

Some languages don't have the generic "for loop" constructs, which leads to the question of how does one iterate through some container or action ? Well there is recursion of course! Whats recursion but doing the same thing again and again until some base condition is met?

Having said that, a function call is allocated to the call stack and the local state copied onto it. So as the calls to the recursive function grow in size so does the stack which can lead to an overflow of it. In the example below are two functions which returns the length of a given list. 

// tail call recursion example
let length lst =
    let rec tail_call lst counter =
        match lst with
        | _::xs -> tail_call xs (counter + 1)
        | [] -> counter
    tail_call lst 0

// non-tail call will stack overflow 
let rec simple_length lst =
    match lst with
    | _::xs -> 1 + simple_length xs
    | [] -> 0

The first example will cause a stack overflow  by calling it with as such

// declare a million element list
let lst = [1..1000000]
let size = simple_length lst  // <-- causes an overflow

// declare a million element list
let lst = [1..1000000]
let size = length lst  // <-- will run just fine!
But by adding the accumulator to the recursion, the program is able to get rid of the intermediate stacks because what we really want has already been accounted for in the accumulator. No need to keep a stack around and do all the extra work!

The failures of C++/C build systems

A quick rant on these build systems. I've been working on restructuring a project over to use CMake to better clean up and deal with some of GNU's automake + tape + weird workarounds currently in place. To which I've had the pleasure of dealing with some errors that hypothetically should have been solved in the past couple decades or so.

For instance imagine such a file:
// a.h
#include <boost/optional.hpp> /* <--- satisfied */
#include <doesnt_exist>
Now imagine its sibling:
// b.h
#include <cstdint>
#include <list>
#include <map>
#include <mutex>
#include <memory>
#include <boost/variant.hpp> /* <--- satisfied */

While the famous "Boost" library got satisfied and provably so, just adding the second include to our first file might result in the head banging error:
Scanning dependencies of target MyLib
[  3%] Building CXX object lib/CMakeFiles/MyLib.dir/a.cpp.o
In file included from /tmp/proj/lib/a.cpp:5:
/tmp/proj/src/lib/b.h:14:10: fatal error: 'boost/variant.hpp' file not found
#include <boost/variant.hpp>
         ^~~~~~~~~~~~~~~~~~~
1 error generated.
make[2]: *** [support/CMakeFiles/MyLib.dir/a.cpp.o] Error 1
make[1]: *** [support/CMakeFiles/MyLib.dir/all] Error 2
make: *** [all] Error 2

So WHY WOULD THIS HAPPEN? Maybe a cache or list of already resolved dependencies would help, something to prevent going down this rabbit hole and consume up thousands/millions of man hours for no real reason. Kind of like fidgeting with the USB adapter when you try to plug it in...