May 02

SSH Key Length Error

I wanted to SSH into a server of mine earlier today, but I was met with this error when I tried to SSH in using the key:

$ ssh -i private-key username@host
Load key "private-key": Invalid key length
username@host's password:

There is some information on this about the exact SSH version that I was using, but one thing that was not clear was if this was a server error or a client error.

As it turns out, this is a client error, the server itself is still fine. There’s a lot of information on the internet, but suffice to say that as of OpenSSH 7.9, you need a key at least 2048 bits long. So to solve this issue, you just need to use an older version of ssh.

If you don’t have an older version of the SSH client available, you could probably get around the issue by installing a virtual machine with an older version of the SSH client. Debian 9(stretch) comes with a version of openssh-client that will work.

Aug 19

C++17 and Fold Expressions

So I’ve just come across this very cool feature in C++17 that I plan on using called ‘fold expressions’. This is basically a way of expanding a bunch of operations. It is also very hard for me to wrap my head around, so we’re going to go through a (very simple) example here, since people don’t seem to explain what exactly happens.

This simple example prints out everything that is passed into the method to stdout:

template <typename... argn>
void fold_test_types( argn... params ){
    (std::cout << ... << params);

int main(){
    fold_test_types( 3, 4, "hi", 8);

The output is 34hi8, because we forgot to add any newlines or spaces, but that’s fine for this example.

Now what really tripped me up here is the fact that in the fold expression, we’re using the << operator to distinguish things, and it makes it very confusing as to what is actually happening. With fold expressions, the data between the parentheses is expanded, kinda like a template. Let’s look at the rules real quick:

( pack op ... ) (1)
( ... op pack ) (2)
( pack op ... op init ) (3)
( init op ... op pack ) (4)

1) unary right fold
2) unary left fold
3) binary right fold
4) binary left fold

op – any of the following 32 binary operators: + – * / % ^ & | = < > << >> += -= *= /= %= ^= &= |= <<= >>= == != <= >= && || , .* ->*. In a binary fold, both ops must be the same.
pack – an expression that contains an unexpanded parameter pack and does not contain an operator with precedence lower than cast at the top level (formally, a cast-expression)
init – an expression that does not contain an unexpanded parameter pack and does not contain an operator with precedence lower than cast at the top level (formally, a cast-expression)
Note that the open and closing parentheses are part of the fold expression.

This is a bit hard to understand, but the … is basically a keyword when it is in the parentheses like this. ‘pack’ is the name of the parameters, e.g. in our example it is ‘params’.

Looking at these rules, the rule that we are following in this example is #4, the binary left fold. Let’s look at the fold expression again, with some comments:

(std::cout << ... << params);
|    |      |  |   |    |  ^----ending paren
|    |      |  |   |    ^---- pack(params) - see definition above
|    |      |  |   ^---- operator(<<)
|    |      |  ^---- ...(keyword?)
|    |      ^---- operator(<<)
|    ^--- init(std::cout) - see definition above
^----Starting paren   

With these broken out, it should now be clear why this is #4, the binary left hold. The rules on how it folds are as follows:

1) Unary right fold (E op …) becomes (E1 op (… op (EN-1 op EN)))
2) Unary left fold (… op E) becomes (((E1 op E2) op …) op EN)
3) Binary right fold (E op … op I) becomes (E1 op (… op (EN−1 op (EN op I))))
4) Binary left fold (I op … op E) becomes ((((I op E1) op E2) op …) op EN)

Since we are using #4, apply the rule for #4. When the compiler runs, the code effectively becomes the following with the template substitution and the fold expression expansion:

void fold_test_types( int arg1, int arg2, std::string arg3, int arg4 ){
    ((((std::cout << arg1) << arg2) << arg3) << arg4);

int main(){
    fold_test_types( 3, 4, "hi", 8);

Apr 25

Fun with templates

As you may be aware, I maintain dbus-cxx, and I’ve been working on it lately to get it ready for a new release. Most of that work is not adding new features, but updating the code generator to work correctly. However, this post today is not about that work, it is about the work that I am doing on the next major version of dbus-cxx(2.0). Part of this work involves using new C++ features, due to libsigc++ needing C++17 to compile now. With the new variadic templates that C++ has(actually since C++11), we can have more than 7 template parameters to a function. (This limit of 7 is arbitrarily chosen by libsigc++, it’s not a C++ limitation.)

Because of this however, some of the code in dbus-cxx needs to change in order to work correctly. The main portion that I’m working on right now has to do with getting the DBus signatures to work properly. Here’s a small piece of code that is currently in dbus_signal.h(after running dbus_signal.h.m4 through m4):

  /** Returns a DBus XML description of this interface */
  virtual std::string introspect(int space_depth=0) const
    std::ostringstream sout;
    std::string spaces;
    for (int i=0; i < space_depth; i++ ) spaces += " ";
    sout << spaces << "<signal name=\"" << name() << "\">\n";

    T_arg1 arg1;
    sout << spaces << "  <arg name=\"" << m_arg_names[1-1] << "\" type=\"" << signature(arg1) << "\"/>\n";
    T_arg2 arg2;
    sout << spaces << "  <arg name=\"" << m_arg_names[2-1] << "\" type=\"" << signature(arg2) << "\"/>\n";
    sout << spaces << "</signal>\n";
    return sout.str();

This method is created once for each overloaded type that we have. The important part is that T_arg is created once for each argument that we have. With variadic templates, this is impossible to do. The way to get the template arguments out of the variadic call is to do recursion.

Recursion + templates is not something that I’m very familiar with, so this took me a while to figure out. However, I present the following sample code for getting the signature of a DBus method:

  inline std::string signature( uint8_t )     { return "y"; }
  inline std::string signature( bool )        { return "b"; }
  inline std::string signature( int16_t )     { return "n"; }
  inline std::string signature( uint16_t )    { return "q"; }
  inline std::string signature( int32_t )     { return "i"; }
  inline std::string signature( uint32_t )    { return "u"; }

  template<typename... argn> class sig;

   template<> class sig<>{
   std::string sigg() const {
     return "";

   template<typename arg1, typename... argn>
   class sig<arg1, argn...> : public sig<argn...> {
   std::string sigg() const{
     arg1 arg;
     return signature(arg) + sig<argn...>::sigg();

int main(int argc, char** argv){
  std::cout << sig<uint32_t,uint32_t,bool,int64_t>().sigg() << std::endl;


This took me a few hours to figure out exactly how to do it, so I’m at least a little proud of it! The other confusing part that I had to work out was how to use a recursive call(with signature()) also with the recursive call for the templates, which leads us to the following observation:

It’s recursion all the way down.

Mar 15

One large program or many little programs?

Earlier this week, I came across this question on  This has some relevance for me, since at work two of our main projects follow the two sides of this design philosophy: one project is a more monolithic application, and the other follows more of a microservices model(e.g. many applications).  There are some reasons for this which I will now attempt to explain.

Option 1: Monolithic Application

The first project that I will explain here is our project that has a more monolithic application.  First of all, a brief overview of how this project works.  Custom hardware(running Linux) collects information from sensors(both built-in and often third-party over Modbus) and aggregates it and classifies it.  The classification is due to the nature of the project – sensor data falls into one of several classes(temperature, voltage, etc.).  This information is saved off periodically to a local database, and then synchronized with a custom website for viewing.  The project, for the most part, can be split into these two main parts:

  • Data collector
  • Web viewer
  • Local viewer(separate system, talks over ethernet)

Due to the nature of the hardware, there is no web interface on the hardware directly, it is on a cloud server.

Now, the data collector application is a mostly monolithic application.  However, it is structured similarly to the Linux kernel in that we have specific ‘drivers’ that talk with different pieces of equipment, so the core parts of the application don’t know what hardware they are talking to, they are just talking with specific interfaces that we have defined.

In this case, why did we choose to go with a monolithic application?  Well, there are a few reasons and advantages.

Reason 1: As primarily a data collector device, there’s no real need to have different applications send data to each other.

Reason 2: The development of the system is much easier, since you don’t need to debug interactions between different programs.

Reason 3: Following from the first two, we often have a need to talk with multiple devices on the same serial link using Modbus.  This has to be siphoned in through a single point of entry to avoid contention on the bus, since you can only have one modbus message in-flight at a time.

Reason 4: All of the data comes in on one processor, there is no need to talk with another processor.  Note that this is not the same as talking with other devices.

Reason 5: It’s a lot simpler to pass data around and think about it conceptually when it is all in the same process.

Now that we have some reasons, what are some disadvantages to this scheme?

Disadvantage 1: Bugs.  Since our application is in C++(the ability to use C libraries is important), a single segfault can crash the entire application.

Disadvantage 2: The build can take a long time; the incremental build and linking isn’t bad, but a clean build can take a few minutes.  A test build on Jenkins will take >10 minutes, and it can still take several minutes to compile on a dev machine if you don’t do parallel make.

Overall, the disadvantages are not show-stoppers(except for number 1, there is some bad memory management happening somewhere but I haven’t figured out where yet).  The separation into three basic parts(data collection, local GUI, web GUI) gives us a good separation of concerns.  We do blend in a little bit of option 2 with multiple applications, but that is to allow certain core functionality to function even if the main application is down – what we use that for is to talk with our local cell modem.  Given that the data collection hardware may not be easily accessible, ensuring that the cellular communications are free from bugs in our main application is important.

Option 2: Multiple Applications

If you don’t want to make a monolithic application, you may decide to do a lot of small applications.  One of my other primary projects uses this approach, and the reason is due to the nature of the hardware and how things need to interact.

In our project with multiple applications, we have both multiple compute units and very disparate sensor readings that we are taking in.  Unlike the monolithic application where data is easily classified into categories, this project has even more disparate data.  Moreover, we take in a lot of different kinds of data.  This data can come in on any processor, so there is no ‘master’ application per se.  This data also needs to be replicated to all displays, which may(or may not) be smart displays.  We also want to insulate ourselves from failure in any one application.  A single bug should not take down the entire system.

To handle this, we essentially have a common data bus that connects all of the processors together.  We don’t use RabbitMQ, but the concept is similar to their federation plugin, in that you can publish a message on any processor and it will be replicated to all connected processors.  This makes adding new processors extremely easy.  All of the data is basically made on a producer/consumer model.

Advantage 1: Program resiliency.  With multiple applications running, a bug in one application will not cause the others to exit.

Advantage 2: You can easily add more processors.  This is not really a problem for us, but since data is automatically synchronized between processors, adding a new consumer of data becomes very simple.

Advantage 3: Data can come and go from any connected system, you need not know in advance which processor is giving out information.

This design is not without some caveats though.

Disadvantage 1: Debugging becomes much harder.  Since you can have more than one processor in the system, your producer and your consumer can be on different processors, or you could have multiple consumers.

Disadvantage 2: Because this is a producer/consumer system(it’s the only way that I can see to effectively scale), there’s no way to get data directly from an application(e.g. there’s no remote procedure call easily possible over the network).


There are two very different use cases for these two designs.  From my experience, here’s a quick rundown:

Monolithic Application

  • Generally easier to develop, since you don’t have to figure out program<->program interactions
  • Often needed if you need to control access to a resource(e.g. physical serial port)
  • Works best if you only have to run on one computer at a time

Multiple Applications

  • Harder to develop due to program<->program interactions
  • Better at scaling across multiple computers
  • Individual applications are generally simpler

Due to the nature of engineering, there’s no one way to do this that is best.  There are often multiple ways to solve a given problem, and very rarely is one of them unequivocally the best solution.

Jan 31

Counting lines of code

So a few years ago, me and a coworker had to figure out how many lines of code we had. This was either for metrics or for because we were curious, I can’t really remember why. I came across the script again today while going through some old code. Here it is in all its glory:


let n=0; for x in "$@"; do temp1=`find $x | grep '\\.cpp$\|\\.c$\|\\.java$\|\\.cc$\|\\.h$\|\\.xml$\|\\.sh$\|\\.pl$\|\\.bash$\|\\.proto$'`; temp=`cat /dev/null $temp1 | grep -c [.]*`; let n=$n+$temp; if [ $temp -gt 0 ]; then printf "%s: " $x ; echo $temp; fi; done ; echo Total: $n

This took us at least an hour(or perhaps more), I’m not really sure.

Anyway, it was after we had done all of that work that we realized that wc exists.

Apr 09

Bitcoin Mining for a Useful Purpose?

So I was thinking about this the other day when I came across this article on Slashdot that points out that GPU prices are high due to the demand for Bitcoin(and other cryptocurrencies) mining.  This got me thinking, what’s the point for this?  What if we could do something useful(well, more useful) than mining for virtual currency?  I’ve been running BOINC for probably about 12+ years now, doing calculations for SETI@Home.  Originally, I wasn’t even using the BOINC client, SETI@Home had their own standalone software that has now been superseded by BOINC.  Which given that the original software was used until 2005, means that I have probably actually been doing this for 15+ years at this point(logging into the SETI website indicates I have been a member since December 2005)…

But I digress.  The question really is, could we mine cryptocurrency as part of the normal BOINC process?  It seems like this would have a number of benefits for people:

  • For mining people, they can still mine coins
  • For projects on BOINC, they gain computing power from people wanting to mine coins at the same time
  • This could result in more computing power for a “good” cause as well, instead of what is(in my mind at least) a rather useless endeavor

I’m not exactly sure how this would work, as I don’t really know anything about blockchain.  Could perhaps Ethereum be used to provide people with “compute credits” that would allow this scheme to work?  It could also provide a good way of storing the calculation results, and have them verifiable.

Feb 19

Intergalactics Source Code

Does anybody out there have the original source code to the Java game Intergalactics?  I was able to pull the (compiled) client off of SourceForge, but without a server it’s not too useful.  I did start updating the client to work properly over the summer along with a new server implementation, but it would still be interesting to get all of the original source code.

Anyway, if you do happen to have the original code, I would be grateful.  Intergalactics was always a nice fun timewaster.  It wasn’t too complicated but it did require a certain amount of strategy.

Jul 21

NTP Woes

For the past year or so, I’ve been battling trying to get NTP fed from GPSD.  This has proven to be somewhat harder than I imagined.  GPSD is supposed to feed NTP directly from shared memory, however this was not happening.  I haven’t been trying to do this the entire time, but it’s been a problem for at least a year!

The simple situation is as follows: I have a device that has a GPS receiver on it in order to get the time.  NTP is also running to sync the time whenever it connects to the internet(for most of the time, we are not on the network).  This is also a very vanilla Debian 8 (Jessie) system on an ARM processor, so we’re not custom-compiling any of the standard packages.  The GPS is also a hotplug, so we have a udev rule that calls gpsctl with the proper TTY when the USB is activated:

(udev match options here) SYMLINK+="gps%n", TAG+="systemd", ENV{SYSTEMD_WANTS}="gpsdctl@%k.service"

According to all of the literature on the web, we should just be able to do this by adding the following to /etc/ntp.conf:

#gpsd SHM
server prefer
fudge refid GPS flag1 1

(note that we are setting flag1 here, as the system that this is running on has no clock backup).

This allows NTP to read from a shared memory address that is fed by GPSD.  You can check this with ntpq -p:

     remote           refid      st t when poll reach   delay   offset  jitter
 SHM(0)          .GPS.            0 l    -   64    0    0.000    0.000   0.000
-altus.ip-connec    2 u    7   64    1  189.620    6.525   5.288
+ks3370497.kimsu    2 u    5   64    1  186.252   17.282  14.504    2 u    5   64    1  186.503   15.792  14.691
*   2 u    4   64    1  155.953    8.786  13.366

As you can see, in this situation there is no connection to the SHM segment, but we are synced with another NTP server.  However, we can verify that GPSD is running and getting GPS data by running the ‘gpsmon’ program(in the ‘gpsd-clients’ package).

The other important thing to note about the SHM segement is that the ‘reach’ value is always 0, and will never increase.  You can also check the output of NTP trying to reach it with:

 ntpq -p

This was very confusing to me: how are we not talking with the SHM segment?  This would also seem to work on some systems, but not other systems.  No matter how long I left it running, NTP would never get any useful data from the SHM segment.  Finally, I stumbled across a link, which I can’t find now, that said that GPSD must be started before NTP.  I then tried that on my system, by stopping my software(which is always reading from GPSD), stopping NTP and then stopping GPSD.  I then started GPSD, NTP, and my software(which will also initialize the GPS system to start sending commands).  This causes NTP to consistently sync with GPSD.

Do this like the following:

$ systemctl stop your-software
$ systemctl stop ntp
$ systemctl stop gpsd
$ .. make sure everything is dead..
$ systemctl start gpsd
$ systemctl start ntp
$ systemctl start your-software

For reference, here’s the ntp.conf that I am using(we only added the SHM driver from the standard Debian install):

# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help

driftfile /var/lib/ntp/ntp.drift

# Enable this if you want statistics to be logged.
#statsdir /var/log/ntpstats/

statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable

# You do need to talk to an NTP server or two (or three).
#server ntp.your-provider.example

#gpsd SHM
server prefer
fudge refid GPS flag1 1

# maps to about 1000 low-stratum NTP servers.  Your server will
# pick a different set every time it starts up.  Please consider joining the
# pool: <>
server iburst
server iburst
server iburst
server iburst

# Access control configuration; see /usr/share/doc/ntp-doc/html/accopt.html for
# details.  The web page <>
# might also be helpful.
# Note that "restrict" applies to both servers and clients, so a configuration
# that might be intended to block requests from certain clients could also end
# up blocking replies from your own upstream servers.

# By default, exchange time with everybody, but don't allow configuration.
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery

# Local users may interrogate the ntp server more closely.
restrict ::1

# Clients from this (example!) subnet have unlimited access, but only if
# cryptographically authenticated.
#restrict mask notrust

# If you want to provide time to your local subnet, change the next line.
# (Again, the address is an example only.)

# If you want to listen to time broadcasts on your local subnet, de-comment the
# next lines.  Please do this only if you trust everybody on the network!
#disable auth

TL;DR Having trouble feeding NTP from GPSD when you have a good GPS lock?  Make sure that GPSD starts before NTP.

Apr 09

QTest Project Setup

Recently, I was looking into how to setup a Qt project with unit tests.  The Qt-esque way to do this is to use the QTest framework within Qt in order to do this.  While not a bad framework, the problem is that there isn’t a good way to integrate it within your project.  What I mean with that is that you must link with the QTest library and have a special main() method in order to run the tests.  This brings up two problems:

  1. How do we run the tests if we have only one executable?
  2. How do we only link with QTest at certain times(e.g. we don’t want to link with QTest when we send the executable out)

Searching for a solution to the problem, one thing that some people did was to create a .pri file with the source files to compile into two projects: one for the main application, and one for the unit tests.  The disadvantages to this are that it makes it hard to work with Qt Creator to add in the files, as it can’t add them in automatically to the SOURCES and HEADERS list.  So that’s not good.

The other solution that I came across was to add in the files to each project.  However, this adds in a lot of overhead, and you must be sure to add in the files to both projects when something changes.  So that’s not good either.

The solution that I came up with was to have three separate projects:

  • One project for the main code of our project.  This gets built as a static library.
  • One project to run the main code of our project.  This is the normal executable.
  • One project to run the unit tests.  This links with QTest, and doesn’t get installed by default.

This is basically the same project setup as described here, although it wasn’t until after I had created my project that I realized it was the exact same thing(I did come across the link at first, but didn’t understand it at the time since it wasn’t using QTest).

With this project setup, you can edit the main code in Qt creator, do anything that you want, and still have everything link and test in a clean manner(no need to link with QTest in your main executable!)

The code for the same project is up on github, to give you a general overview of how it all fits together.