Friday, May 22, 2020

C++20 ❤️ SQL

cpp20_sql

A thin, typesafe SQL wrapper in C++20

For data accessing databases, SQL is the champ. It is cross-platform and cross database. There is a ton of information online, and you can probably find an SQL snippet to do almost anything.

In addition, SQL has a nice property, in that you can typically develop and debug it an interactive manner by using the database environment. You can quickly run a query, and check that you are getting the expected result. The REPL(read, eavaluate, print loop) makes for quick development.

Now that we have a working SQL query. How do we get it into C++. There are a whole host of existing solutions. However, they often fall short in a few areas:

Typesafe SQL columns and parameters. We would like compile time typesafety when we are reading the results and when we bind parameters to our parameterized queries.
Be able to copy and paste working SQL without too much modification and with no external code generators.
No macros

C++20 provides us with improved compile time features that can allow us to accomplish these goals.

The code can be found here: https://github.com/google/cpp-from-the-sky-down/tree/master/cpp20_sql

tagged_sqlite.h is a single self-contained header with no depedencies other than the C++ standard library and SQLite.

example.cpp is an example program.

Note: This is not an officially supported Google library.

Also, it is only at a proof of concept state, and not ready for use in production.

Currently, only GCC 10 supports enough of C++20 to be able to use the library

The library lives in the skydown namespace, with the user defined literals in skydown::literals.

To illustrate, let us do a quick customers and orders database.

Assume we have the following tables from SQL:

CREATE TABLE customers(
      id INTEGER NOT NULL PRIMARY KEY, 
      name TEXT NOT NULL"
      );

CREATE TABLE orders(
    id INTEGER NOT NULL PRIMARY KEY,
    item TEXT NOT NULL, 
    customerid INTEGER NOT NULL,
    price REAL NOT NULL, 
    discount_code TEXT 
    );

Now we want to find all the orders that are above a certain price and join them with the customerΓÇÖs name.

SELECT orders.id, name, item, price, 
      discount_code
      FROM orders JOIN customers ON customers.id = customerid 
      WHERE price > 100;

This will give us a list of of orders.

Now in a program, we want to be able to specify our minimum price at runtime, instead of hard coding.

The temptation would be to use string substitution. However, that is the wrong answer, as it opens you up to all sorts of SQL injection attacks. Instead you want to do parameterized queries.

SELECT orders.id, name, item, price, 
      discount_code
      FROM orders JOIN customers ON customers.id = customerid 
      WHERE price > ?;

This is the syntax for a parameterized query that we use with a prepared statement. Then we just bind the value we want to our parameter and execute the statement.

Now let us see what the code looks like to do this query and get the results.

skydown::prepared_statement<
      "SELECT orders.id:integer, name:text, item:text, price:real, "
      "discount_code:text? "
      "FROM orders JOIN customers ON customers.id = customerid "
      "WHERE price > ?min_price:real;"
      >select_orders{sqldb};

Here we are constructing an object of template class skydown::prepared_statement passing our SQL query as the template parameter.

However, if we look closely at the query string, we will notice it is slightly different from the SQL string.

Instead of orders.id, in the SQL string, we have orders.id:integer. Also, instead of ? for the parameters, we have ?min_price:real

What is going on here?

Turns out, if we can just annotate the resulting columns with types, and the parameters with names and types, we can have nicely named and typed input and output for the query, and treat the rest of the query as a black box.

The library uses those annotations to construct tags with types for parameters and fields.

Prior to sending the query to the SQL engine, it strips out the annotations.

This allows us to not have to care about the internals of what the SQL statement is doing. We just care about the inputs (which are the annotated parameters) and the outputs(which are the annotated selected columns).

Here is the list of currently supported types:

:text ==> std::string_view
:integer ==> std::int64_t
:real ==> double

You can add a ? to the end of the type to make it std::optional For example :real? would map to std::optional<double>.

Let us see how we use this in our C++ code.

for (auto &row :
        select_orders.execute_rows("min_price"_param = 100)) {
    std::cout << row["orders.id"_col] << " ";
    std::cout << row["price"_col] << " ";
    std::cout << row["name"_col] << " ";
    std::cout << row["item"_col] << " ";
    std::cout << row["discount_code"_col].value_or("<NO CODE>") << "\n";
}

Here we are calling the execute_rows member function of our select orders object, passing in the parameters. _param is a user defined string literal in namespace skydown::literals. If a parameter is not specified, or an incorrect string is used, or if an incorrect type is used, you will get a compile time error.

In our query for select_orders we had ?min_price:real so if we assign a value "min_price"_param that cannot be converted to a double we will get a compile time error.

We use a range for loop to iterate ther eturned rows. _col is another used defined string literal, and we index the row object using the string literal. Again, if we have the wrong name, it is a compile time error. In addition, the returned values have the correct type according to our annotated SQL statement.

orders.id:integer ==> std::int64_t
price:real ==> double
name:text ==> std::string_view
item:text ==> std::string_view
discount_code:text? ==> std::optional<std::string_view>

What happens, if we mess up and type the wrong column name.

To see what happens, I changed discount_code to discount_core in the following expression:

    std::cout << row["discount_core"_col].value_or("<NO CODE>") << "\n";

I get a compiler error which includes the following output.

1>C:\Users\johnb\source\repos\cpp-from-the-sky-down\cpp20_sql\tagged_sqlite.h(220,16): message : template argument deduction/substitution failed:
1>C:\Users\johnb\source\repos\cpp-from-the-sky-down\cpp20_sql\tagged_sqlite.h(228,20): message : 'skydown::sqlite_experimental::fixed_string<13>{"discount_core"}' is not equivalent to 'skydown::sqlite_experimental::fixed_string<9>{"orders.id"}'
1>C:\Users\johnb\source\repos\cpp-from-the-sky-down\cpp20_sql\tagged_sqlite.h(228,20): message : 'skydown::sqlite_experimental::fixed_string<13>{"discount_core"}' is not equivalent to 'skydown::sqlite_experimental::fixed_string<4>{"name"}'
1>C:\Users\johnb\source\repos\cpp-from-the-sky-down\cpp20_sql\tagged_sqlite.h(228,20): message : 'skydown::sqlite_experimental::fixed_string<13>{"discount_core"}' is not equivalent to 'skydown::sqlite_experimental::fixed_string<4>{"item"}'
1>C:\Users\johnb\source\repos\cpp-from-the-sky-down\cpp20_sql\tagged_sqlite.h(228,20): message : 'skydown::sqlite_experimental::fixed_string<13>{"discount_core"}' is not equivalent to 'skydown::sqlite_experimental::fixed_string<5>{"price"}'
1>C:\Users\johnb\source\repos\cpp-from-the-sky-down\cpp20_sql\tagged_sqlite.h(228,20): message : 'skydown::sqlite_experimental::fixed_string<13>{"discount_core"}' is not equivalent to 'skydown::sqlite_experimental::fixed_string<13>{"discount_code"}'

Notice how it is is showing us the typo, discount_core and showing us the actual columns: orders.id, name, item, price, discount_code.

Next time, we will talk a bit more about using the library, and start looking at some of the techniques used in implementing it.

If you want to play with this code, you can just compile example.cpp with g++10 and link to sqlite3.

One of the nice things about this kind of approach, is that the library can be agnostic to the actual SQL. This allows it to be relatively small (around 600 lines of code, about half of which is interfacing to SQLite3) as well as be able to support all the features of a database, because we are just passing in SQL to be executed by the engine.

In addition, because the SQL query is a compile time string, you cannot get runtime SQL injection attacks.

Right now, only SQLite is supported, but I plan on separating out SQLite from the library and supporting multiple databases such as MySQL and Postgres SQL.

Please feel let me know what you think in the comments.

Tuesday, February 2, 2016

Emulating C++17 Structured Bindings in C++14

TLDR

Dependencies: C++14, Boost Preprocessor

#include <auto_tie.hpp> in your file. This file is found in include/auto_tie.hpp at https://github.com/jbandela/auto_tie

// Set of student id, name, gpa,grade
std::set<std::tuple<int,std::string,char,double>> myset;

// AUTO_TIE copies/moves elements of the tuple/pair
auto r = AUTO_TIE(iterator, success) = myset.insert(std::make_tuple(2,"Raja",'B',3.1)); 
if (r.success) {
  // AUTO_TIE_REF has references to the tuple/pair elements
  auto s = AUTO_TIE_REF(id, name, grade, gpa) = *r.iterator;
  std::cout << "Successfully inserted " << s.id << " " << " " << s.name << " " << s.grade << " " << s.gpa << "\n";
}

Introduction

Bjarne Stroustrup back in Novemeber wrote a nice progress report, available here, of the Kona meeting. One of the proposals considered is called structured binding. The proposal addresses one of the inconveniences of returning multiple values from a function using tuples. While, it is very easy for a function to return multiple values, it is harder for the caller to use them. Here is an example from the write up.

consider the following function

tuple<T1,T2,T3> f() { /*...*/ return make_tuple(a,b,c); }

If we want to split the tuple into variables without specifying the type, we have to do this;

auto t = f();
auto x = get<1>(t);
auto y = get<2>(t);
auto z = get<3>(t);

The proposal puts forth the following syntax instead

auto {x,y,z} = f();               // x has type T1, y has type T2, z has type T3

I am excited for this feature, and for C++17 in general. While waiting for C++17, I decided to see how close I could get with C++14. Here is the result.

auto r = AUTO_TIE(x,y,z) = f();               // x has type T1, y has type T2, z has type T3

// Unlike the C++17 feature, you need to use r.x instead of just x
std::cout << r.x << "," << r.y << "," << r.z << "\n";

Also, if I have an L-value tuple, and I just want convenient names for the members without moving/copying, I can use AUTO_TIE_REF, like this

auto t = f();

auto r = AUTO_TIE_REF(x,y,z) = t;

// Can access r.x,r.y,r.z but they are all references to t

Implementation

If you just wanted some background and how to use the library, you can stop reading here. I will now talk about to implement it.

Let us say we have this function

template<class T1, class T2, class T3>
std::tuple<T1, T2, T3> f();

And we wanted to access the tuple elements as x,y,z.

Here is one way we could do this.

template<class T1, class T2, class T2>
struct xyz_elements{
 T1 x;
 T2 y;
 T3 z; 
};

Then we can use a helper class to fill in with the tuple values;

struct auto_tie_helper{
  
 template<class Tuple}
 auto operator=(Tuple&& t){
   using T = xyz_elements<std::tuple_element_t<0,Tuple>,
     std::tuple_element_t<1,Tuple>,std::tuple_element_t<2,Tuple>>;
   return T{std::get<0>(std::foward<Tuple>(t)),
     std::get<1>(std::foward<Tuple>(t)),std::get<2>(std::foward<Tuple>(t))};

 }

};

template<class T>
auto auto_tie(){return auto_tie_helper{};}

Then we can use the above like this

auto r = auto_tie() = f();

std::cout << r.x << "\n";

This is great... if we only ever wanted to use 3 element tuples and use x,y,z as the element names. Let us make helper a template. But what should we take as the template parameter? We would need something like a template template because we do not know types of the tuple elements when we instantiate the helper. However, taking a template template will prove be problematic for reasons that will be explain later. Instead, let us decltype with a function object to figure out the types we need.

template<class F>
struct auto_tie_helper {

    template<class T, std::size_t... I>
    auto construct(T&& t, std::index_sequence<I...>) {
        using type = decltype(std::declval<F>()(std::get<I>(std::forward<T>(t))...));
        return type{ std::get<I>(std::forward<T>(t))... };
    }
    template<class T>
    auto operator=(T&& t) {
        return construct(std::forward<T>(t), std::make_index_sequence<std::tuple_size<std::decay_t<T>>::value>{});
    }

};

template<class F>
auto auto_tie(F f) {
    return auto_tie_helper<F>{};
}

Then we can use auto_tie like this.

auto r = auto_tie([](auto x, auto y, auto z){return xyz_elements<decltype(x),decltype(y),decltype(z)>{};}) = f();
std::cout << r.x << "\n";

The lambda we pass to auto_tie returns xyz_elements with the correct types. auto_tie_helper uses std::declval along with decltype to get the type that results from calling our lambda (which will be xyz_elements<decltype(x),decltype(y),decltype(z)>.

However, what if one of the elements of the tuple is not default constructible? We will get an error in our lambda. To fix this, let us have the lamda return a pointer to xyz_elements, and auto_tie_helper use std::remove_pointer_t to get rid of the pointer. This way, we do not require default construction.

template<class F>
struct auto_tie_helper {

    template<class T, std::size_t... I>
    auto construct(T&& t, std::index_sequence<I...>) {
        using type = std::remove_ptr_t<decltype(std::declval<F>()(std::get<I>(std::forward<T>(t))...))>;
        return type{ std::get<I>(std::forward<T>(t))... };
    }
    template<class T>
    auto operator=(T&& t) {
        return construct(std::forward<T>(t), std::make_index_sequence<std::tuple_size<std::decay_t<T>>::value>{});
    }

};

template<class F>
auto auto_tie(F f) {
    return auto_tie_helper<F>{};
}

Then we can use auto_tie like this.


auto r = auto_tie([](auto x, auto y, auto z){return static_cast<xyz_elements<decltype(x),decltype(y),decltype(z)>*>(nullptr);}) = f();
std::cout << r.x << "\n";

Now, we can use define a template outside our function for the number of tuple elements we want with the names we want, and use auto_tie with that by providng the appropriate lamda function that returns a pointer to the type we want. However, it is still an inconvenience to have to define a template class outside the function where we are using auto_tie. However, we cannot define a template class inside a function, as that is forbidden by C++. Instead, we define a class inside our generic lambda what we pass to auto_tie.


auto r = auto_tie([](auto x_, auto y_, auto z_){
  struct my_struct{
    decltype(x_) x; 
    decltype(y_) y; 
    decltype(z_) z; 

  };
  return static_cast<my_struct*>(nullptr);}) = f();
std::cout << r.x << "\n";

By the way, this is the reason that we used a decltype with a function object instead of a template template in auto_tie_helper. Now we are able to use auto_tie in a self-contained way. However, it is very verbose. Because it is self-contained, we can create a macro using Boost Preprocessor to make this all less verbose.


#define AUTO_TIE_HELPER1(r, data, i, elem) BOOST_PP_COMMA_IF(i) auto BOOST_PP_CAT(elem,_)
#define AUTO_TIE_HELPER2(r, data, elem) decltype( BOOST_PP_CAT(elem,_) ) elem ;

#define AUTO_TIE_IMPL(seq) auto_tie([]( BOOST_PP_SEQ_FOR_EACH_I(AUTO_TIE_HELPER1, _ , seq ) ) { \
    struct f1f067cb_03fe_47dc_a56d_93407b318d12_auto_tie_struct { BOOST_PP_SEQ_FOR_EACH(AUTO_TIE_HELPER2, _, seq) }; \
    return static_cast<f1f067cb_03fe_47dc_a56d_93407b318d12_auto_tie_struct*>(nullptr);\
})

#define AUTO_TIE(...) AUTO_TIE_IMPL(BOOST_PP_VARIADIC_TO_SEQ(__VA_ARGS__) )

AUTO_TIE takes macro variable args and converts it to a Boost Preprocessor sequence and passes it to AUTO_TIE_IMPL. AUTO_TIE_IMPL uses AUTO_TIE_HELPER1 to create the lambda parameters. Then it defines a struct with a unique name so we don't have any accidental name collisions - f1f067cb_03fe_47dc_a56d_93407b318d12_auto_tie_struct. Then it uses AUTO_TIE_HELPER2 to define the members. Finally, as in the hand coded lambda above, it returns a pointer to the struct. AUTO_TIE_IMPL calls auto_tie with the above lambda. So finally we can write...

auto r = AUTO_TIE(x,y,z) = f();
std::cout << r.x << "\n";

Conclusion

I had a lot of fun writing this. I learned the following lessons while doing this.

C++14 generic lambdas are surprisingly powerful and enable stuff that could not be done before
By limiting macros to just dealing with names (which templates can't handle), and having templates deal with expressions (which macros are good at messing up), you can get some nice, safe, terse syntax.

I think this technique, can also be extended to do some other cool stuff, that I will discuss in the future.

Let me know what you think.

Wednesday, January 13, 2016

A Workaround for Lambda ODR Violations

As brought up in the post https://www.reddit.com/r/cpp/comments/40lm8o/lambdas_are_dangerous/ with lambdas in inline functions you can run into ODR violations and thus undefined behavior.

There is also a stack overflow discussion at http://stackoverflow.com/questions/34717823/does-using-lambda-in-header-file-violate-odr

While, the ultimate fix may rely with the Core Working Group, I think here is a work around.

The basis for the trick come from Paul Fultz II in a post about constexpr lambda. You can find the post at http://pfultz2.com/blog/2014/09/02/static-lambda/

Here is some problematic code from the stackoverflow discussion. The lambda may have a different type across translation units and thus result in different specializations of for_each being called for different translation units resulting in ODR violations and thus undefined behavior.

    inline void g() {
        int arr[2] = {};
        std::for_each(arr, arr+2, [] (int i) {std::cout << i << ' ';});
    }

Here is a simple fix that will prevent the ODR violation.

    // Based on Richard Smith trick for constexpr lambda
    // via Paul Fultz II (http://pfultz2.com/blog/2014/09/02/static-lambda/)
    template<typename T>
    auto addr(T &&t)
    {
        return &t;
    }

    static const constexpr auto odr_helper = true ? nullptr : addr([](){});

    template <class T = decltype(odr_helper)>
    inline void g() {
        int arr[2] = {};
        std::for_each(arr, arr+2, [] (int i) {std::cout << i << ' ';});
    }

We create a static const constexpr null pointer with the type of a lambda. If lambdas are different types across different translation units then odr_helper will have different types across different translation units. Because g now is a template function using the type of odr_helper, g will be a different specialization across different translation units and thus will not result in an odr violation.

Also note that because T is defaulted, g can be used without any changes from before.

ideone at https://ideone.com/NdBpXN

Thursday, December 19, 2013

A Workaround for Type Inference with Expression Templates and Proxies

Back in 2011 Motti Lanzkron wrote an article titled "Inferring Too Much"

The problem brought to light by the article is that C++11 auto interacts badly with expression templates and proxies. Just replacing the type with auto can cause undefined behavior as shown by the following lines of code taken from the article above

#include <vector>
#include <iostream>
#include <limits>
std::vector<bool> to_bits(unsigned int n) {
    const int bits = std::numeric_limits<unsigned int>::digits;
    std::vector<bool> ret(bits);
    for(int i = 0, mask = 1; i < bits; ++i, mask *= 2)
        ret[i] = (n &  mask) != 0;
    return ret;
}

int main()
{
    bool b = to_bits(42)[3];
    auto a = to_bits(42)[3];
    std::cout << std::boolalpha << b << std::endl;
    std::cout << std::boolalpha << a << std::endl;
}

So how do we fix it?

There has been some talk about adding an operator auto that you could define in your class. However, it might be some time before we get something like that.

Herb Sutter in his "Almost Always Auto" says this is a feature and not a bug, "because you have a convenient way to spell both 'capture the list or proxy' and 'resolve the computation' depending which you mean".

Here is some code discussing this

auto a = matrix{...}, b = matrix{...}; // some type that does lazy eval
auto ab = a * b;                       // to capture the lazy-eval proxy
auto c = matrix{ a * b };              // to force computation

Unfortunately, not only is this potentially dangerous but it can be tedious. What if matrix takes some template parameters such as dimensions and type. Now you have

auto a = matrix<2,3,double>{...}, b = matrix<3,2,double>{...}; // some type that does lazy eval
auto ab = a * b;                       // to capture the lazy-eval proxy
auto c = matrix<3,3,double>{ a * b };              // to force computation

In this scenarior we are fast loosing the benefits of auto. Is there some way that we can have our auto and our expression templates. Here is a workaround, which admittedly is not perfect, but I think it is the best we can do without changing the language.

We are going to simulate operator auto

namespace operator_auto {
    template <class T> struct operator_auto_type {
        using type = T;
    };

    
    struct operator_auto_imp {
    template <class T> typename operator_auto_type<T>::type operator=(T &&t){
        return std::forward<T>(t);
    }
};
     

namespace {
    operator_auto_imp _auto;
}
}

All this does is create a variable _auto that when assigned to it returns whatever was assigned converted to another type which in the default case is the same type.

Then we specialize operator_auto_type like this

// For my::string for Motti's example
namespace operator_auto {

    template <class T> 
    struct operator_auto_type<my::string::concat<T> > 
    {
       using type = my::string;
    };
}

// For vector bool
namespace operator_auto {

    template <> 
    struct operator_auto_type<std::vector<bool>::reference>
    {
        using type = bool;
    };
}

Now to use it, whenever we use auto with an expression that might yield a proxy, we just include an additon assignment to _auto. Here is how we would use it with my::string

    using operator_auto::_auto;
    my::string a("hello"), b(" "), c("world"), d("!");
    auto s = _auto = a + b + c + d;
    auto a1 = _auto = a;
    std::cout << s << std::endl;

Notice that for a1 were are actually assigning to a my::string. In this cause the assignment to _auto will become a no-op.

For full source code for this take a look at https://gist.github.com/jbandela/8042689 For a runnable version look at http://ideone.com/eLyg7T

As for the name _auto, I chose it because it was short and the underscore kind of suggested "flatten" or "collapse" leading to a mnemonic of "collapse auto" which is kind of suggestive what you want. However, you can easily change it if you wish.

Let me know what you think in the comments. I welcome your comments, suggestions, and ideas.

John Bandela

Tuesday, April 30, 2013

C# style async/await in C++ - Part 2 Using with Microsoft PPL/PPLX

Last time we talked a little about asynchrony and about the cpp_async_await project. The previous article is located at http://jrb-programming.blogspot.com/2013/04/c-style-asyncawait-in-c-part-1.html. All code for the project is located at https://github.com/jbandela/cpp_async_await/. We talked about how to use the library with Boost.Asio.

As mentioned before the other major C++ library is Microsoft PPL/PPLX (PPLX is the cross platform port of PPL by Microsoft Casablanca Project) You can obtain PPLX and the documentation at http://casablanca.codeplex.com/ along with a host of other really neat stuff such as an http client, json library, etc. From here on out, unless specified otherwise, you can take what I say about PPL and assume that it applies to PPLX.

While Boost.Asio uses a callback model, PPL/PPLX uses a continuation model. The key class is

template < typename _Type>
class task;

_Type specifies the type of value produced by the task and it can be void. Task is very similar to std::future with the addition of the .then method. Whereas std::future has a .get method which blocks until the future is complete, the .then method allows a lambda to be specified which will be called when the task is complete. You can read more about PPL tasks at http://msdn.microsoft.com/en-us/library/dd492427(v=vs.110).aspx.

Here is an example of how to use tasks and continuations taken from the above link

// basic-continuation.cpp 
// compile with: /EHsc
#include <ppltasks.h>
#include <iostream>

using namespace concurrency;
using namespace std;

int wmain()
{
    auto t = create_task([]() -> int
    {
        return 42;
    });

    t.then([](int result)
    {
        wcout << result << endl;
    }).wait();

    // Alternatively, you can chain the tasks directly and 
    // eliminate the local variable. 
    /*create_task([]() -> int
    {
        return 42;
    }).then([](int result)
    {
        wcout << result << endl;
    }).wait();*/
}

/* Output:
    42
*/

This is actually pretty neat and it is easier to chain tasks than in Boost.Asio.

There is currently a proposal to add .then to std::future. You can find the proposal at http://isocpp.org/files/papers/N3558.pdf

However, it gets hard to use once you need to do anything in a loop. Due to this, along with other reasons, there is a proposal to add resumable functions to the C++ standard. You can find the paper at http://isocpp.org/files/papers/N3564.pdf

Here is one of the motivating example from that paper. Note, they are using a future with the .then continuations just like a PPL task currently

auto write =
    [&buf](future<int> size) -> future<bool> 
{ 
    return streamW.write(size.get(), buf).then(
        [](future<int> op){ return op.get() > 0; });
};
auto flse = [](future<int> op){ return 
    future::make_ready_future(false);};
auto copy = do_while(
    [&buf]() -> future<bool> 
{ 
    return streamR.read(512, buf)
        .choice(
        [](future<int> op){ return op.get() > 0; }, write, flse);
});

The code asynchronously reads a stream 512 bytes at a time until no more bytes are read, while asynchronously writing what was read. Here is how the code looks with the proposed C++ language additions. Note that resumable marks a function as resumable and await suspends the function and then resumes the function when the awaited future(task) is complete returning the value generated by the task that was awaited.

int cnt = 0;
do 
{
cnt = await streamR.read(512, buf);
if ( cnt == 0 ) break;
cnt = await streamW.write(cnt, buf);
} while (cnt > 0);

Notice how much easier to follow the code is with the language additions. The downside is you will have to wait for the proposal to be approved, become part of a standard, and for your compiler to implement it.

The good news is you can have much of the same convenience using cpp_async_await now. First a motivating example. In the Casablanca REST SDK that provides PPLX, there is an example of asynchronously searching a file for lines which contain a some string and writing them asynchronously to another file. You can find the code at http://casablanca.codeplex.com/SourceControl/changeset/view/040c323727ca7747beb254ecf2b8eac73632f3be#Release/collateral/Samples/SearchFile/searchfile.cpp. We are using PPLX because it is a bit easier to have a real example with a commandline app. You would use PPL tasks in the same way as PPLX tasks.

#include <filestream.h>
#include <containerstream.h>
#include <producerconsumerstream.h>

using namespace utility;
using namespace concurrency::streams;

/// <summary>
/// A convenient helper function to loop asychronously until a condition is met.
/// </summary>
pplx::task<bool> _do_while_iteration(std::function<pplx::task<bool>(void)> func)
{
    pplx::task_completion_event<bool> ev;
    func().then([=](bool guard)
    {
        ev.set(guard);
    });
    return pplx::create_task(ev);
}
pplx::task<bool> _do_while_impl(std::function<pplx::task<bool>(void)> func)
{
    return _do_while_iteration(func).then([=](bool guard) -> pplx::task<bool>
    {
        if(guard)
        {
            return ::_do_while_impl(func);
        }
        else
        {
            return pplx::task_from_result(false);
        }
    });
}
pplx::task<void> do_while(std::function<pplx::task<bool>(void)> func)
{
    return _do_while_impl(func).then([](bool){});
}

/// <summary>
/// Structure used to store individual line results.
/// </summary>
typedef std::vector<std::string> matched_lines;
namespace Concurrency { namespace streams {
/// <summary>
/// Parser implementation for 'matched_lines' type.
/// </summary>
template <typename CharType>
class _type_parser<CharType, matched_lines>
{
public:
    static pplx::task<matched_lines> parse(streambuf<CharType> buffer)
    {
        basic_istream<CharType> in(buffer);
        auto lines = std::make_shared<matched_lines>();
        return do_while([=]()
        {
            container_buffer<std::string> line;
            return in.read_line(line).then([=](const size_t bytesRead)
            {
                if(bytesRead == 0 && in.is_eof())
                {
                    return false;
                }
                else
                {
                    lines->push_back(std::move(line.collection()));
                    return true;
                }
            });
        }).then([=]()
        {
            return matched_lines(std::move(*lines));
        });
    }
};
}}
/// <summary>
/// Function to create in data from a file and search for a given string writing all lines containing the string to memory_buffer.
/// </summary>
static pplx::task<void> find_matches_in_file(const string_t &fileName, const std::string &searchString, basic_ostream<char> results)
{
    return file_stream<char>::open_istream(fileName).then([=](basic_istream<char> inFile)
    {           
        auto lineNumber = std::make_shared<int>(1);
        return ::do_while([=]()
        {
            container_buffer<std::string> inLine;
            return inFile.read_line(inLine).then([=](size_t bytesRead)
            {
                if(bytesRead == 0 && inFile.is_eof())
                {
                    return pplx::task_from_result(false);
                }

                else if(inLine.collection().find(searchString) != std::string::npos)
                {
                    results.print("line ");
                    results.print((*lineNumber)++);
                    return results.print(":").then([=](size_t)
                    {
                        container_buffer<std::string> outLine(std::move(inLine.collection()));
                        return results.write(outLine, outLine.collection().size());
                    }).then([=](size_t)
                    {
                        return results.print("\r\n");
                    }).then([=](size_t)
                    {
                        return true;
                    });
                }

                else
                {
                    ++(*lineNumber);
                    return pplx::task_from_result(true);
                }
            });
        }).then([=]()
        {
            // Close the file and results stream.
            return inFile.close() && results.close();
        });
    })

    // Continution to erase the bool and return task of void.
    .then([](std::vector<bool>) {});
}

/// <summary>
/// Function to write out results from matched_lines type to file
/// </summary>
static pplx::task<void> write_matches_to_file(const string_t &fileName, matched_lines results)
{
    // Create a shared pointer to the matched_lines structure to copying repeatedly.
    auto sharedResults = std::make_shared<matched_lines>(std::move(results));

    return file_stream<char>::open_ostream(fileName, std::ios::trunc).then([=](basic_ostream<char> outFile)
    {
        auto currentIndex = std::make_shared<size_t>(0);
        return ::do_while([=]()
        {
            if(*currentIndex >= sharedResults->size())
            {
                return pplx::task_from_result(false);
            }

            container_buffer<std::string> lineData((*sharedResults)[(*currentIndex)++]);
            outFile.write(lineData, lineData.collection().size());
            return outFile.print("\r\n").then([](size_t)
            {
                return true;
            });
        }).then([=]()
        {
            return outFile.close();
        });
    })

    // Continution to erase the bool and return task of void.
    .then([](bool) {});
}

#ifdef _MS_WINDOWS
int wmain(int argc, wchar_t *args[])
#else
int main(int argc, char *args[])
#endif
{
    if(argc != 4)
    {
        printf("Usage: SearchFile.exe input_file search_string output_file\n");
        return -1;
    }
    const string_t inFileName = args[1];
    const std::string searchString = utility::conversions::to_utf8string(args[2]);
    const string_t outFileName = args[3];
    producer_consumer_buffer<char> lineResultsBuffer;

    // Find all matches in file.
    basic_ostream<char> outLineResults(lineResultsBuffer);
    find_matches_in_file(inFileName, searchString, outLineResults)

    // Write matches into custom data structure.
    .then([&]()
    {
        basic_istream<char> inLineResults(lineResultsBuffer);
        return inLineResults.extract<matched_lines>();
    })

    // Write out stored match data to a new file.
    .then([&](matched_lines lines)
    {
        return write_matches_to_file(outFileName, std::move(lines));
    })

    // Wait for everything to complete.
    .wait();

    return 0;
}

Notice how painful iteration is. Now here is the code using cpp_await_async pplx_helper. Just a quick note. The code above first copies the matching lines into a producer_consumer_buffer and then into a vector and then to the output file. My code copies into the producer_consumer_buffer and then uses that buffer to copy to output. I think, my code achieves the same level of concurrency as the example program. If I am incorrect in this, please let me know in the comments below. You can find the whole file at https://github.com/jbandela/cpp_async_await/blob/master/PplxExample2.cpp

#include "pplx_helper.hpp"
#include <filestream.h>
#include <containerstream.h>
#include <producerconsumerstream.h>

using namespace utility;
using namespace concurrency::streams;



#ifdef _MS_WINDOWS
int wmain(int argc, wchar_t *args[])
#else
int main(int argc, char *args[])
#endif
{
    if(argc != 4)
    {
        printf("Usage: PplxExample2 input_file search_string output_file\n");
        return -1;
    }
    const string_t inFileName = args[1];
    const std::string searchString = utility::conversions::to_utf8string(args[2]);
    const string_t outFileName = args[3];
    producer_consumer_buffer<char> lineResultsBuffer;

    // Find all matches in file.
    basic_ostream<char> outLineResults(lineResultsBuffer);

    auto reader = pplx_helper::do_async([&](pplx_helper::async_helper<void> helper){
        auto inFile = helper.await(file_stream<char>::open_istream(inFileName));
        int lineNumber = 1;
        bool done = false;
        while(!done){
            container_buffer<std::string> inLine;
            auto bytesRead = helper.await(inFile.read_line(inLine));
            if(bytesRead==0 && inFile.is_eof()){
                done = true;
            }
            else if(inLine.collection().find(searchString) != std::string::npos){
                helper.await(outLineResults.print("line "));
                helper.await(outLineResults.print(lineNumber++));
                helper.await(outLineResults.print(":"));
                container_buffer<std::string> outLine(std::move(inLine.collection()));
                helper.await(outLineResults.write(outLine,outLine.collection().size()));
                helper.await(outLineResults.print("\r\n"));
            }
            else{
                ++lineNumber;
            }

        }
        helper.await(inFile.close() && outLineResults.close());
    });

    auto writer = pplx_helper::do_async([&](pplx_helper::async_helper<void> helper){
        basic_istream<char> inLineResults(lineResultsBuffer);
        auto outFile = helper.await(file_stream<char>::open_ostream(outFileName,std::ios::trunc));
        auto currentIndex = 0;
        bool done = false;
        while(!done){
            container_buffer<std::string> lineData;
            auto bytesRead = helper.await(inLineResults.read_line(lineData));
            if(bytesRead==0 && inLineResults.is_eof()){
                done = true;
            }
            else{
                container_buffer<std::string> lineDataOut(std::move(lineData.collection()));
                helper.await(outFile.write(lineDataOut,lineDataOut.collection().size()));
                helper.await(outFile.print("\r\n"));
            }
        }
        helper.await(inLineResults.close() && outFile.close());

    });


    try{
    // Wait for everything to complete and catch any exceptions
    (reader && writer).wait();

    }
    catch(std::exception& e){
        std::cerr << e.what();
    }

    return 0;
}

Notice how we can easily do iteration. The library is pretty similar to what can be achieved with the language additions. Instead of of resumable to mark a function as resumable, we use pplx_helper::do_async which takes a lambda. The lambda takes a single parameter of pplx_helper::async_helper<void>. If the lambda were to return an int for example it would take pplx_helper::async_helper<int> . In general a lambda return type T takes pplx_helper::async_helper<T>. In the case of the example code the parameter is named helper. In the language proposal you use the unary await keyword to suspend the function until a task is complete and then resume the function returning the value generated by the task were were awaiting. In your code we call helper.await on the task you want to await. helper.await provides pretty much the same convenience as the language keyword await.

You can use the same syntax to work with PPL tasks by using namespace ppl_helper. In summary for PPLX(Project Casablanca) use namespace pplx_helper and for PPL (Shipped with Visual C++ on Windows) use ppl_helper

This functionality is packaged up for you at https://github.com/jbandela/cpp_async_await. It is licensed under the Boost Software License which allows usage for both open source and commercial applications. It is a header only library and does not need to be built, but it does depend on Boost.Coroutine and needs to be linked to the boost_context library. The library has been tested with Visual C++ 2012 on Windows, and G++ 4.7.2 on Fedora Linux.

I hope you have enjoyed this discussion. Download the code and try it out, and let me know what you think. If people are interested, I will talk in a future post about how the library actually works.

Thanks,

John Bandela

Friday, April 26, 2013

C# style async/await in C++ - Part 1: Introduction and use with Boost.Asio

Asynchronous Programming

Asynchronous programming has become more and more important recently as a way to efficiently use the resources available with multicore processors yet at the same time avoid dealing with locking primitives.
In C++, two important libraries for this type of programming are Boost.Asio and Microsoft's Parallel Patterns Library (PPL) Task Library. Boost.Asio provides asynchronous operations with callback handlers. You can learn more about Boost.Asio here. The PPL Task Library provides asynchronous operations using continuations. You can learn more about PPL here.

Async/Await

The problem with using these libraries is that they operate differently from synchronous programming. Your logic ends up being in either multiple callback handlers or in multiple lambda continuations. C# recently added async/await to make it easier to write asynchronous code. You can find out more about them here and watch a presentation here.
There is even a proposal to add this to C++. You can see the proposal here.

Async/Await using Boost.Coroutine

However, you don't want to wait for a language proposal to be approved and then get implemented my compilers to make your programming easier. In fact, you can have a lot of the benefit now. The key that you need is Boost.Coroutine. Boost.Coroutine is in the 1.53 release of Boost. You can read about Boost.Coroutine here. Using Boost.Coroutine, I wrote cpp_async_await which is an open source library with a Boost Software License that allows (as much as possible with a library only solution) async/await style programming in C++ with Boost.Asio and Microsoft PPL/PPLx.

Motivating example

Go take a look at a simple async http client using raw Boost.Asio http://www.boost.org/doc/libs/1_53_0/doc/html/boost_asio/example/http/client/async_client.cpp

Welcome back. Boost.Asio is very powerful, but the callbacks make the logic hard to follow. In contrast here is our version of the same code. You can find the full code at

https://github.com/jbandela/cpp_async_await/blob/master/Example2.cpp

void get_http(boost::asio::io_service& io,std::string server, std::string path){

    using namespace asio_helper::handlers;
    // This allows us to do await
    asio_helper::do_async(io,[=,&io](asio_helper::async_helper helper){

        using boost::asio::ip::tcp;

        // This allows us to use the predefined handlers
        // such as read_handler, write_handler, etc
        using namespace asio_helper::handlers;

        tcp::resolver resolver_(io);
        tcp::socket socket_(io);
        boost::asio::streambuf request_;
        boost::asio::streambuf response_;

        // Form the request. We specify the "Connection: close" header so that the
        // server will close the socket after transmitting the response. This will
        // allow us to treat all data up until the EOF as the content.
        std::ostream request_stream(&request_);
        request_stream << "GET " << path << " HTTP/1.0\r\n";
        request_stream << "Host: " << server << "\r\n";
        request_stream << "Accept: */*\r\n";
        request_stream << "Connection: close\r\n\r\n";

        // Start an asynchronous resolve to translate the server and service names
        // into a list of endpoints.
        tcp::resolver::query query(server, "http");

        // Do async resolve
        tcp::resolver::iterator endpoint_iterator;
        boost::system::error_code ec;
        std::tie(ec,endpoint_iterator) =  helper.await<resolve_handler>(
            [&](resolve_handler::callback_type cb){
                resolver_.async_resolve(query,cb);
        });
        if(ec) {throw boost::system::system_error(ec);}

        // Do async connect
        std::tie(ec,std::ignore) = helper.await<composed_connect_handler>(
            [&](composed_connect_handler::callback_type cb){
                boost::asio::async_connect(socket_,endpoint_iterator,cb);    
        });
        if(ec){throw boost::system::system_error(ec);}

        // Connection was successful, send request
        std::tie(ec,std::ignore) = helper.await<write_handler>(
            [&](write_handler::callback_type cb){
                boost::asio::async_write(socket_,request_,cb);
        });
        if(ec){throw boost::system::system_error(ec);}

        // Read the response status line
        std::tie(ec,std::ignore) = helper.await<read_handler>(
            [&](read_handler::callback_type cb){
                boost::asio::async_read_until(socket_,response_,"\r\n",cb);
        });
        if(ec){throw boost::system::system_error(ec);}

        // Check that the response is OK
        std::istream response_stream(&response_);
        std::string http_version;
        response_stream >> http_version;
        unsigned int status_code;
        response_stream >> status_code;
        std::string status_message;
        std::getline(response_stream, status_message);
        if (!response_stream || http_version.substr(0, 5) != "HTTP/")
        {
            std::cout << "Invalid response\n";
            return;
        }
        if (status_code != 200)
        {
            std::cout << "Response returned with status code ";
            std::cout << status_code << "\n";
            return;
        }

        // Read the response headers, which are terminated by a blank line.
        std::tie(ec,std::ignore) = helper.await<read_handler>(
           [&](read_handler::callback_type cb){
                boost::asio::async_read_until(socket_, response_, "\r\n\r\n",cb);
        });
        if(ec){throw boost::system::system_error(ec);}

        // Process the response headers.
        std::istream response_stream2(&response_);
        std::string header;
        while (std::getline(response_stream2, header) && header != "\r")
            std::cout << header << "\n";
        std::cout << "\n";

        // Write whatever content we already have to output.
        if (response_.size() > 0)
            std::cout << &response_;

        // Continue reading remaining data until EOF.
        bool done = false;
        while(!done){

            std::tie(ec,std::ignore) = helper.await<read_handler>(
                [&](read_handler::callback_type cb){ 
                    boost::asio::async_read(socket_, response_,
                        boost::asio::transfer_at_least(1), cb);         
            });
            if(ec && ec != boost::asio::error::eof){
                throw boost::system::system_error(ec);
            }
            done = (ec == boost::asio::error::eof);
            // Write all of the data so far
            std::cout << &response_;
        }
   });
}

Discussion

Notice how we can have the code all in one function instead of spreading it out, and can read it with a single scan instead of jumping to the handler then back. The magic happens in await. Let's look at a single call

// Connection was successful, send request
std::tie(ec,std::ignore) = helper.await<write_handler>(
        [&](write_handler::callback_type cb){
            boost::asio::async_write(socket_,request_,cb);
});

helper.await takes a template parameter to specify what handler to use. Handlers are defined in namespace asio_helper::handlers. Await takes a single function parameter that consists of a lambda. The lambda takes a parameter of write_handler::callback_type. If we were using a read_handler, it would be read_handler::callback_type and so on.

helper.await returns whatever parameters were passed into the callback handler as a single value, pair,or tuple depending on the number of parameters in the handler. A read_handler has boost::system::error_code ec and std::size_t bytes_transferred as parameters so it returns an std::pair. We then can use std::tie to get the error code and ignore the bytes transferred.

The await function calls the asynchronous Boost.Asio function and then uses Boost.Coroutine to suspend our function and "return" to the calling function. Meanwhile the callback_type is a special function object that when called by Boost.Asio uses Boost.Coroutine to resume our function.
The cpp_async_await library defines handlers for the following Boost.Asio handler types in namespace asio_helper::handlers:

read_handler for ReadHandler
write_handler for WriteHandler
completion_handler for CompletionHandler
accept_handler for AcceptHandler
composed_connect_handler for ComposedConnectHandler
connect_handler for ConnectHandler
resolve_handler for ResolveHandler
wait_handler for WaitHandler
signal_handler for SignalHandler
ssl_handshake_handler for HandshakeHandler
ssl_shutdown_handler for ShutdownHandler

The handlers allow async_helper::await to return as a value,pair, or tuple whatever values are passed to the callback function.

The code is at https://github.com/jbandela/cpp_async_await/

It is a header only library. For Boost.Asio you need to include asio_helper.hpp. You will need to link to boost_system and boost_context libraries. The code will compile on Windows with MSVC 2012 and on Linux with gcc 4.7.2. You need Boost version 1.53 as that is the version that has Coroutine.

There is also support for Microsoft PPL and PPLx. Include ppl_helper.hpp and pplx_helper.hpp. Due to PPL and PPLx being different from Boost.Asio, there are a few minor changes in how you use the library with PPL and PPLx.

Thanks for taking the time to read this. Download the code and take a look at it and play around with it. Let me know what you think. Next time we will talk about using this library with PPL and PPLx

-John Bandela

Saturday, February 2, 2013

Digit Separators in C++

Problem

Here is the statement of the problem as described by Lawrence Crowl

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html#Problem

Pronounce 7237498123.
Compare 237498123 with 237499123 for equality.
Decide whether 237499123 or 20249472 is larger.

The paper then goes on to describe a proposal the C++ standard to add digit separators to C++. In this blog post, we will look at what we can do for this problem in C++ as it is currently.

Doing this involves (ab)using the preprocessor ## operator which concatenates 2 tokens. Using this yields two different options. We will use the number 1234567 as an example for these 2 options.

Option 1 – Self-contained but not terse

   1: #define n 1 ## 234 ## 567   2:     3: int j = n;   4:     5: #undef n

This option has the advantage that anyone that knows C would be able to figure out what is going on without looking at any other code. The disadvantage is that this is definitely not terse.

Option 2 – Terse but requires macro

Given the macro below

   1: #define NUM_HELPER(a,b,c,d,e,f,g,...) a##b##c##d##e##f##g   2:     3: #define NUM(...) NUM_HELPER(__VA_ARGS__,,,,,,,,,)

We can write the example number 1234567 as below

   1: int j = NUM( 1,234,567 );

The advantage is that this option is terse. It also has the same format that is commonly used outside of programming. The disadvantage is that the code looks pretty confusing unless you know what NUM does. The other disadvantage is that the macro also pollutes the namespace.

While both of these approaches are inferior to C++ native support for digit separators, the two options described above work with C++ now.

Please let me know what you think and which option you like better.

- John Bandela