Thursday, December 20, 2012

Easy Binary Compatible Interfaces Across Compilers in C++ - Part 0 of n: Introduction and a Sneak Preview


The problem of using a C++ library compiled with Compiler A, from a program compiled with Compiler B has been a problem for a while. This is especially true on Windows where Visual C++ generally breaks binary compatibility from release to release. Shipping a library for Windows involves shipping several versions for Visual C++ as well now often for mingw gcc.
Some of the problems C++ has in regards to binary compatibility across different compilers are:name mangling,object layout, exception support.
There are several ways to get around this.
There are whole books written on COM, so I won’t try to go into too many details. A brief overview in regards to the binary interface is here.
The basic idea is that you define an interface like this
Interface Definition
  1. struct Interface;
  2. struct InterfaceVtable{
  3.     int (*Function1)(struct Interface*);
  4.     int (*Function2)(struct Interface*, int);
  5. };
  6.  
  7. struct Interface{
  8.     struct InterfaceVtable* pTable;
  9.  
  10. };
It can be used like this
Using an Interface
  1. struct Interface* pInterface = GetInterfaceSomehow();
  2. int a = pInterface->pTable->Function1(pInterface);
Implementing an interface like this is painful and will be left as an exercise to the reader Smile.
Fortunately, (and by design), Microsoft Visual C++ and most Windows C++ compilers will generate something compatible to the above with an abstract base class using pure virtual functions.
Inteface using C++ (MSVC)
  1. struct InterfaceCpp{
  2.     virtual int Function1() = 0;
  3.     virtual int Function2(int) = 0;
  4. };
You can implement and use like this
Code Snippet
  1. struct InterfaceImplementation:public InterfaceCpp{
  2.     virtual int Function1(){return 5;}
  3.     virtual int Function2(int i){return 5 + i;}
  4. };
  5.  
  6. InterfaceImplementation imp;
  7. InterfaceCpp* pInterfaceCpp = &imp;
  8. std::cout << pInterfaceCpp->Function2(5) << std::endl;
The reason for this, is that the version with function pointers was doing a vtable and a vptr by hand and this version is letting the compiler do it. For more information about vtable and vptr see the excellent article by Dan Saks in Dr. Dobbs.
While the above solution works on Windows (generally), this is not guaranteed to always work A more general cross-platform solution is presented in Matthew Wilson’s Imperfect C++ in chapters 7 and 8. He basically provides a way and macros that allow you to define the above structure manually (ie define your own vtables).
By using either COM style interfaces with compilers that have a compatible vtable layout or rolling your own, you can have cross-compiler binary compatible interfaces.However, you do not have
  • Exceptions
  • Due to not having exceptions, you often have to use error codes and thus do not have real return values.
  • Standard C++ types such as vector and string (use arrays and const char*)
In fact, in an article explaining why Microsoft created C++/CX Jim Springfield stated one of the problems with COM even with libraries such ATL was
“There is no way to automatically map interfaces from low-level to a higher level (modern) form that throws exceptions and has real return values.”
During this series of posts, I will discuss the development of a C++11 library that has the following benefits
  • Able to use std::string and std::vector as function parameters and return values
  • Use exceptions for error handling
  • Compatible across compilers – able to use MSVC to create.exe and g++ to create .dll on Windows, and g++ for executable and clang++ to create .so on Linux
  • Works on Linux and Windows
  • Written in Standard C++11
  • No Macro magic
  • Header only library
As we progress we will talk about some of the disadvantages and areas for improvements and possible alternatives
Here is how we would define an interface DemoInterface. Note jrb_interface is the namespace of the library.
Code Snippet
  1. using namespace jrb_interface;
  2.  
  3. template<bool b>
  4. struct DemoInterface
  5. :public define_interface<b,4>
  6. {
  7.     cross_function<DemoInterface,0,int(int)> plus_5;
  8.  
  9.     cross_function<DemoInterface,1,int(std::string)> count_characters;
  10.  
  11.     cross_function<DemoInterface,2,std::string(std::string)> say_hello;
  12.  
  13.     cross_function<DemoInterface,3,std::vector<std::string>(std::string)>
  14.         split_into_words;
  15.  
  16.     template<class T>
  17.     DemoInterface(T t):DemoInterface<b>::base_t(t),
  18.         plus_5(t), count_characters(t),say_hello(t),split_into_words(t){}
  19. };
.
In this library, all interfaces are actually templates that take a bool parameter. The reason for this will become clear as we discuss the implementation in later posts.
All interfaces inherit from define_interface which takes a bool parameter (just use the bool passed in to the template) and an int parameter specifying how many functions are in the interface.  If you pass in a too small number, you will get a static_assert telling you that the number is too small.
To define a function in the interface, use the cross_function template
The first parameter is the interface in this case DemoInterface. The second parameter is the 0 based position of the function. The first function is 0, the second is 1, the third 2, etc. The third and final parameter of cross_function is the signature of the function is the name style as std::function.
Finally all interfaces need a templated constructor that takes a value t and passes it on to the base class as well as each function. For convenience the define_interface template defines a typedef base_t that you can use in your constructor initializer.
To implement an interface you would do this
Code Snippet
  1. struct DemoInterfaceImplemention:
  2.     public implement_interface<DemoInterface>{
  3.  
  4.         DemoInterfaceImplemention(){
  5.  
  6.             plus_5 = [](int i){
  7.                 return i+5;
  8.             };
  9.  
  10.             say_hello = [](std::string name)->std::string{
  11.                 return "Hello " + name;
  12.             };
  13.  
  14.             count_characters = [](std::string s)->int{
  15.                 return s.length();
  16.             };
  17.  
  18.             split_into_words =
  19.                 [](std::string s)->std::vector<std::string>{
  20.                     std::vector<std::string> ret;
  21.                     auto wbegin = s.begin();
  22.                     auto wend = wbegin;
  23.                     for(;wbegin!= s.end();wend = std::find(wend,s.end(),' ')){
  24.                         if(wbegin==wend)continue;
  25.                         ret.push_back(std::string(wbegin,wend));
  26.                         wbegin = std::find_if(wend,s.end(),
  27.                             [](char c){return c != ' ';});
  28.                         wend = wbegin;
  29.                     }
  30.                     return ret;
  31.             };
  32.  
  33.         }
  34. };
To implement an interface, you derive from implement_interface specifying your Interface as the template parameter. Then in your constructor you assign a lambda with the same signature you specified in the definition of the interface to each of the cross_function variables.
To use an interface, you construct use_interface providing the Interface as the template parameter.
Code Snippet
  1. // Assume iDemo is defined as follows
  2. // use_interface<DemoInterface> iDemo = ...
  3. int i = iDemo.plus_5(5);
  4.  
  5. int count = iDemo.count_characters("Hello World");
  6.  
  7. std::string s =  iDemo.say_hello("John");
  8.  
  9. std::vector<std::string> words = iDemo.split_into_words("This is a test");
You then call the functions just as you would with any class object. Note the use of . instead of –>
Thank you taking the time to read this post. I hope this has piqued your interest. In future posts we will explore how we create this library, and how we can extend this library to do more. I hope you will join me.
You can find compilable code at
https://github.com/jbandela/cross_compiler_call
The code has been tested on
  • Windows with compiling the executable with MSVC 2012 Milan (Nov CTP) and the DLL with mingw g++ 4.7.2
  • Ubuntu 12.10 with compiling the executable with g++ 4.7.2 and the .so file with clang++ 3.1
Instructions on how to compile are included in the README.txt file.
Please let me know what you think in the comments section
- John Bandela

13 comments:

  1. Quite interesting, although very unfriendly syntax. Do you think it would be possible to automatically generate wrappers for existing classes? (based on maybe ctags output?)

    ReplyDelete
  2. It would be possible to generate wrappers (using libclang probably). Which part of the syntax did you find particularly unfriendly?

    ReplyDelete
  3. Very interesting project, and I'm eagerly waiting for the next post(s) :)
    What are the limitations with this technique? I'm thinking of a larger project with multiple inter-dependent shared libraries... If I could migrate selected shared dll's to another compiler, such as Visual Studio 2012 or mingw (currently using Visual Studio 2010), it would be fantastic! However, I'm uncertain if or how that would work...?

    ReplyDelete
  4. Thanks for your kind words. Since this post, I have added a lot more features to the library. They are available on github link above. In terms of limitations, the biggest is that the compiler has to be have c++11 support including support for variadic templates. Visual Studio 2012 has it with the November (codename Milan) CTP. Visual Studio 2010 does not. If you have intel c++ 13, I believe you could use that with Visual Studio 2010 since it supports variadic templates. Also there is some overhead involved since we are converting types back and forth at the boundaries.

    In terms of next post, I am going to give a talk at C++Now (formerly BoostCon) in May of this year about this. I have been busy improving the code and working on my presentation. If you are interested in where this code is, take a look at the demo code.

    ReplyDelete
  5. Thanks for the quick answer! I was thinking of limitations with regards to compatibility; what types are one required to convert at the boundaries? std namespace? boost or external libraries?

    How can I (if possible) share custom data structures? What if I keep my (custom) class hierarchy in library A compiled with compiler A, and use library A in both library B (compiled with compiler B) and library C (compiled with compiler C)?

    ReplyDelete
    Replies
    1. The library does not depend on any external library except standard C++ with 2 exceptions: the boost unit test framework for the unit test code, and Windows and linux system calls to load dynamic libraries and look up functions.

      In terms of conversions, the library supports char, std::int8_t/uint8_t - std::int64_t,and float and double as well as pointers and references to the above.

      Also supports std::string,std::vector of anything supported, and std::pair of anything supported.

      In addition, you can define your own conversions. The rule is that what gets passed can't have anything a C struct couldn't. Take a look at cross_conversion and cross_conversion_return templates in cross_compiler_conversions.hpp (look in cross_compiler_interface/implementation)

      If you want I could try to see if I could help you make your custom data types be able to be used with this library. It would be good feedback to see someone else use this library.

      Delete
    2. Am I correct to assume that vector and string is pretty easy to support, because the standard requires a certain layout? I'm guessing it might not be that computationally expensive either, because only two pointers are passed.

      Something like std::set or std::map is perhaps trickier?

      Thanks for helping me out :)

      I have a simplified example of my data model here : https://gist.github.com/meastp/5116333

      The example is simple, but if it is possible to adopt that model without too much performance penalty and work, I would be very happy to not depend on old compilers and have a modern, backwards-compatible solution. :)

      Delete
    3. For your example take a look at
      https://github.com/jbandela/cross_compiler_examples

      look at example_1 - I used your data model (and added a few stuff)

      example_1_exe.cpp is the exe that uses the interface
      example_1_dll.cpp is the dll that implements the interface
      example_1_interface.h defines the interfaces that the exe and dll use.

      There is an MSVC solution as well if you want to play around.

      Make sure you git the latest from cross_compiler_call
      Make sure you use MSVC November CTP that has variadic template support

      In regards to your first question,
      with string we pass 2 pointers for parameters, for returning it gets trickier. With vector, we end up passing some function pointers and a pointer to void which we use to reconstruct vector on the other side.

      If you have any further questions please let me know

      Delete
    4. This is great! :)

      In your example you redefine the data model.

      Is it possible to use a non-intrusive adaption as well, i.e. adding support without modifying the data model classes (if I have common header/source files with the data model that I can not modify, but need to be compatible with)?

      Kind of like boost::iterator_facade vs iterator_adaptor...

      Delete
  6. Sorry so late to get back, take a look at my leveldb repository on github. I adapted leveldb to cross_compiler_interface. You build leveldb using visual c++. Then build the dll with visual c++. Then you can use visual c++ or g++ to build the exe. This is a rough first pass, so it probably still has some bugs

    ReplyDelete
  7. Hi,
    No worries! I've been following your github activity in the cross_compiler repositories. :)

    I'm guessing there are *a lot* of developers held back because they have to be compatible with old compilers. So this is very useful.

    Do you think it is possible to gain VC10 compatibility by mimicking variadic templates with macros (a lot of work, obviously, but so is staying on an outdated compiler, not being able to use the new features ;) )? Once your library is complete, if we can use it without too much performance penalty, I think I would like to attempt that to be able to move to a more modern compiler, while still being VC10 compatible through the cross_compiler interface :)

    One question about the implementation:
    Does the interface support creating objects in both the exe and the dll, and passing it back and forth across boundaries? In your example with the leveldb:

    1. create a (native) leveldb object in the cpp/exe.
    2. manipulate that leveldb.
    3. pass it to the library/dll through the interface
    4. manipulate the same leveldb object, that was received through the interface

    Obviously, I would have to compile the leveldb twice (once for the dll, and once for the exe), but is this possible?

    The reason I ask is because that feature would make it possible to use a subset of a large codebase with the cross_compiler_interface.

    Perhaps this wasn't a very good explanation? I can try to make an illustration and/or code if you want.. :)

    Thanks,
    Mats

    ReplyDelete
    Replies
    1. An initial attempt at vc10 backport. Unit tests pass but because now all faux variadics, need more coverage for functions of different arity. Has same interface as the variadic template version.
      It is at
      https://bitbucket.org/jbandela/cross_compiler_call_vc10

      Take a look at it. Developing it was no fun at all - just tediously cranking out code (trying to debug macros in VC++ would have been a nightmare).

      Delete
    2. Sorry for the late reply.

      Wow! A VC10-compatible version, great! I understand that it is a rough version, but I think it proves that it is possible to "upgrade" some components of our code base to more modern compilers..

      Could you have a look at my question as well, because if passing objects created on either side(s) of the interface is supported, I might be able to try this out in production pretty soon. :)

      Thanks for your hard work! :)

      Delete