Friday, November 23, 2012

Almost Asynchronous DB Loading in Dart

‹prev | My Chain | next›

I had originally planned to continue to improve the performance of my persistent dart dirty datastore, but you know what? 10 seconds to read 100,000 records into memory is already pretty darn good. I need this to serve as a dirt-simple datastore for a demo app in Dart for Hipsters. This, combined with a web server written in Dart, will allow me to present a Dart-only solution, free of the current node.js dependency for the backend. Since readers will have on the order of dozens of records, a 10k records/sec load time ought to be more than sufficient for my purposes. But...

I am currently using the blocking readAsLinesSync to load records back into memory:
library dart_dirty;

import 'dart:io';
import 'dart:json';

class Dirty implements HashMap<String, Object> {
  // ...
  _load() {
    // ...
    var lines = db.readAsLinesSync();
    lines.forEach((line) {
      var rec = JSON.parse(line);
      _docs[rec['key']] = rec['val'];
    });

    onLoad(this);
  }
  // ...
}
It would be more Darty of me to use the non-blocking version of the method: readAsLines (sans-sync). In addition to being more Darty, it would also alleviate the startup problem. While waiting for the datastore to load, my sample web server is unnecessarily blocked from responding:
➜  dart-dirty git:(perf) curl -i http://localhost:8000/json
curl: (7) couldn't connect to host
➜  dart-dirty git:(perf) curl -i http://localhost:8000/json
curl: (7) couldn't connect to host
➜  dart-dirty git:(perf) curl -i http://localhost:8000/json
HTTP/1.1 200 OK
content-length: 264
content-type: application/json

{"value":99999,"noise":"          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n        "}%
One could argue that this is desired behavior—that the web server should not respond until all of the data is loaded. Even if that is true, the homepage and static files should not be blocked. Regardless, my dart-dirty data store should not limit the calling context to a blocking scheme. Rather it should work asynchronously and leave the decision of how to deal with data-loading to the calling context.

So the question then becomes, how do I switch to the asynchronous version without breaking all of my tests. The first thing to try is to simply remove the Sync version of the readAsLines and make use of the Future that is returned by the asynchronous version:
class Dirty implements HashMap<String, Object> {
  // ...
  _load() {
    // ...
    db.readAsLines().then((lines) {
      lines.forEach((line) {
        var rec = JSON.parse(line);
        _docs[rec['key']] = rec['val'];
      });

      onLoad(this);
    });
  }
  // ...
}
When I run my test suite, I get a single failure—reading from the filesystem no longer works as expected:
FAIL: reading can read a record from the DB stored on the filesystem
  Expected: <{answer: 42}>
       but: expected a map.
Looking at that test, the actual failure is in the local expectStorage function:
    test("can read a record from the DB stored on the filesystem", () {
      expectStorage() {
        var db = new Dirty('test/test.db');
        expect(
          db['everything'],
          equals({'answer': 42})
        );
      }

      var db = new Dirty('test/test.db');
      db['everything'] = {'answer': 42};
      db.close(expectAsync0(
        expectStorage
      ));
    });
The code below expectStorage is responsible for the initial write. It then closes the DB, which ensures that all data is flushed to the filesystem. Once the DB is closed the supplied expectStorage callback is invoked. Wrapping it in an expectAsync0 ensures that, should it fail, all errors are caught so that the remainder of the test suite can run.

The fix for the failing test should be that I expect another async call in expectStorage(). This time, I expect the async call when the data is loaded:
    test("can read a record from the DB stored on the filesystem", () {
      expectStorage() {
        var db = new Dirty('test/test.db', onLoad: expectAsync1((db) {
          expect(
            db['everything'],
            equals({'answer': 42})
          );
        }));
      }
The onLoad callback is already defined so I ought to be good to go. And, indeed, it does work because my entire test suite is again passing:
➜  dart-dirty git:(perf) ✗ dart test/dirty_test.dart
unittest-suite-wait-for-done
PASS: new DBs creates a new DB
PASS: writing can write a record to the DB
PASS: reading can read a record from the DB
PASS: reading can read a record from the DB stored on the filesystem
PASS: removing can remove a record from the DB
PASS: removing can remove keys from the DB
PASS: removing can remove a record from the filesystem store

All 7 tests passed.
To be honest, I am a little surprised that that actually worked. As I found yesterday, if asynchronous actions are taking place, then a method can complete well before all of the async work completes. When this happens in a unit test, the test is terminated and considered a failure.

And yet the exact opposite seems to be happening here. The test waits for the "asynchronous" future to complete and for all of the processing to complete as indicated by the onLoad callback being called:
  _load() {
    // ...
    print("1");
    db.readAsLines().then((lines) {
      print("2");
      lines.forEach((line) {
        var rec = JSON.parse(line);
        _docs[rec['key']] = rec['val'];
      });
      print("3");

      onLoad(this);
    });
  }
I add those print statement to test out a theory. If this really is asynchronous, then I ought to be able to get a null request back from the web server while the data is loading. But I do not. Instead, the request hangs until dart-dirty finished reading from the persistent store, and then replies with the desired response:
➜  dart-dirty git:(perf) ✗ curl -i http://localhost:8000/json
HTTP/1.1 200 OK
content-length: 264
content-type: application/json

{"value":99999,"noise":"          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n          xxxxx xxxxx\n        "}%
I see this behavior regardless of how many curl commands I issue before the first response and regardless of whether or not the second or third print statement are reached. Since I am able to connect to the server, even if it is not responding, this seems an improvement over the previous behavior.

My guess is that this is another example of Dart Futures not being (automatically) asynchronous. I find the behavior unexpected, but I cannot decide if it is a bad thing or not. After all, the response does come back eventually and I do have all of my tests passing. I will sleep on this behavior and possibly revisit tomorrow. But since everything is working, I may just ignore and move onto something else tomorrow.


Day #578

No comments:

Post a Comment