ptypes vs boost variant vs boost any


This was going to be a FAQ in my book but I think this is better as a blog post.

Let's talk about boost variants. The interesting part of a boost variant is that you must specify the types that it will use. For example:

boost::variant< int, std::string > v;

That's fine - though if you want to store a vector of variants then you must specify the types when defining the vector. For example:

vector<boost::variant< int, std::string > >;

That's not terrible but the issue comes in when you want to get data out of the variant. Without using "visitors" the best you can do is guess or use if statements to check the cast like so:

void times_two( boost::variant< int, std::string > &amp; operand )
    if ( int* pi = boost::get<int>( &amp;operand ) )
        *pi *= 2;
    else if ( std::string* pstr = boost::get<std::string>( &amp;operand ) )
        *pstr += *pstr;

You could also use .type method on variant to get the type. But all that does is called typeid on *this. typeid is also not the same across platforms so you would have to do something like this:

if (typeid(int) == v.type())

It works but usually not for the stuff you want.

Now onto boost's any. It is what it's name suggests - it will accept any type. However, this has the same drawback as variant - you have to figure out the type on your own - it doesn't provide any built in type hints.

Let's say for example - I have two any objects:

boost::any o1 = 1;
boost::any o2 = "2";
boost::any o3 = o1 + o2;

or more valid code:

boost::any o1 = 1;
boost::any o2 = "2";
boost::any o3 = any_cast<int>(o1) + any_cast<int>(o2);

The compiler says you are not allowed to do that. Casting from char[2] to int is illegal. It kind of makes sense really - are you adding them as strings or as numbers? Well the casting should have made this clear.

With either of them - you have to write extra code to determine if it is a string and convert it to an int.

Now let's talk about ptypes. ptypes is a cross-platform library providing easy to use threading, networking, and replacement types like a string class, a vector replacement, and a variant type.

ptype implements their variant using a union so it <del>doesn't</del> does support (if your class inherits from pt::component you can store it inside of a variant - except you are on your own to determine what the exact class is) any POD type but does compose most of the more common types including for the ability for a variant to turn into an array. It is important to note that - a variant can be changed into ANY datatype and it knows what type it currently is.

So let's take the example from earlier:

variant o1 = 1;
variant o2 = "2";
variant o3 = int(o1) + int(o2);

This outputs 3 - similar to any lose typed language like Python. The benefit here is that the casting/conversion is done in the backend so you don't have to worry about if it is a string or int - it just works.

I'm not saying the boost provided types are useless - however - in certain applications you may just want a simpler, looser type, approach.

I have personally done some research into seeing if I can get a variant data type without the use of a union - and so far not much luck without some wild C++ hacking.

My point is that the automatic (intelligent) casting makes quite a difference.

MySQL queries


Recently I decided to do some work on a sqlsnapshot utility and tried to figure out the best way to do a bulk update on the database.

Here are the test scripts I used to generate the SQL:

for i in {1..9000}; do echo "UPDATE test SET name = 'test$i' WHERE id = $i;"; done > update.sql
for i in {1..9000}; do echo "($i,'test$i'), "; done > replace.sql

SQL Schema:

	`name` VARCHAR(50) NOT NULL DEFAULT 'None',

System specs:

mysqld Ver 5.5.31-0ubuntu0.12.10.1 for debian-linux-gnu on i686 ((Ubuntu))
Ubuntu 12.10
Running on a VM in VMWare Player

First off: single SQL statements are bad.

For example:

UPDATE test SET name = 'test1' WHERE id = 1;
UPDATE test SET name = 'test2' WHERE id = 2;
UPDATE test SET name = 'test3' WHERE id = 3;
UPDATE test SET name = 'test4' WHERE id = 4;
UPDATE test SET name = 'test5' WHERE id = 5;
UPDATE test SET name = 'test6' WHERE id = 6;
UPDATE test SET name = 'test7' WHERE id = 7;
UPDATE test SET name = 'test8' WHERE id = 8;
UPDATE test SET name = 'test9' WHERE id = 9;
-- ...

This SQL should look like this:

REPLACE INTO test VALUES (1,'test1'),
-- ....

On average the mass UPDATE took anywhere from 20-30 seconds to execute! The replace only took 0.30-0.12 seconds!

Now of course, different schemas may yield different results - but even the delete/insert was 0.20-0.12 seconds (ie DELETE FROM test, INSERT INTO...). From what I see it appears that MySQL doesn't wait for the user to stop entering SQL queries. There really wouldn't be a good way to detect this though, besides assuming that if a user hasn't entered a SQL statement for the past second to run the transaction.

Wait transactions....let's take a look at those. In transactions you can run a bunch of queries, and a SQL will run them in a shadow table, and when you tell the SQL engine "COMMIT" it will commit all of the queries you just made.

So I modified the update SQL script.

UPDATE test SET name = 'test1' WHERE id = 1;
UPDATE test SET name = 'test2' WHERE id = 2;
-- ...

Running it now takes about 0.70 seconds instead of 20 seconds! What a difference. Adding transaction to the replace script turned it into a consistent 0.12 seconds. However, it didn't seem to do anything for deleting/inserting and inserting data.

Transactions are good for:


Transactions do no improve:


And when you can use REPLACE over UPDATE.

C++ parameter passing


C++ parameter passing is a very subtle idea but an important one.

There are three types of passing in C++:

- Value (or copy)
- Reference (or passing the address of the variable)
- Const-reference (same as above except the object can't be modified)

Passing by value means you want C++ to make a copy of the variable you send in:

void f(int x) { x = 5; }
int z = 4;
cout << z << endl;

The value of z will be 4. This is because in f you are only working with a copied local version of z. However, if we wanted to modify the variable we could change the signature of the function like so:

void f(int & x) { x = 5; }
int z = 4;
cout << z << endl;

z will now be 5. This is because we are telling C++ to send in the address of z rather than a copy. This is a very subtle idea because lets say you had the following:

void f(vector<LARGE_CLASS>; x) { cout << << endl; }
vector<LARGE_CLASS>; x;

A copy of the vector will be sent to the function (well really the copy constructor will get called - which the end result would be a copy of the vector array).

What should be done is the following:

void f(vector<LARGE_CLASS>; & x) { cout << << endl; }
vector<LARGE_CLASS>; x;

However, there may be instances in which you want to send the address but don't want to allow the function to modify the vector. You could use the const keyword for that like:

void f(const vector<LARGE_CLASS>; & x) { cout << << endl; }
vector<LARGE_CLASS>; x;

By adding const the following code will not work/compile:

void f(const vector<LARGE_CLASS>; & x) { x.push_back(LARGE_CLASS()); }
vector<LARGE_CLASS>; x;

There is a third way to pass parameters which would allow a user of your function/method to pass in a NULL/nullptr value. This notation marks it as a pointer, which is fine but you call it slightly differently:

void f(int * x) { *x = 5; }
int z = 4;
cout << z << endl;
Home ← Older posts