Skip to main content

Hi, I'm Mariano Guerra, below is my blog, if you want to learn more about me and what I do check a summary here: marianoguerra.github.io or find me on twitter @warianoguerra or Mastodon @marianoguerra@hachyderm.io

Papers of the Week VII

Because nothing lasts forever and after a week half traveling and a busy one I managed to read 4 papers this week.

The first one was interesting but comes from an area I will describe as "let's bend relational databases to fit Event Stream Processing", which is not bad per se but has things like joins and being able to remember past events that make its scalability (at least in terms of memory) quite hard, also it never discuses distribution, which is ok for the field but not what I'm looking for.

The interesting part about this one is the part where it introduces Visibly Pushdown Languages something that looks really interesting but I couldn't find an introduction for mere mortals, the descriptions are really dense an mathematical, which is ok but hard to learn for outsiders like me.

Another interesting point is the fact that it uses the XML Schema to optimize the generated VPA (Visibly Pushdown Automata) and that the implementation not only applies to XML but to any nested semistructured data.

The review of the next one will seem conflicting with my previous reviews, but this one had too much enfasis on the low level implementation details, not novel things and optimizations, just a lot of details, like the guys found the implementation really cool and wanted to share it with the world. Not a bad thing per se, but in this batch I was looking for abstractions, optimizations and distribution characteristics of stream processing, better if focused on distributed systems, and this one talked mainly about the DSL they build that compiles to C. It also sorts the streams, does multiple passes over the data, does lookahead in the stream and does a kind of "micro batches" which isn't what I was looking for.

The last one, I found the approach interesting, they seemed to try to push the purity of the approach (everything is a regular expression) which may have end up with a nice model (a thing I like) but by reading the code it doesn't seem to be really clear, at least for a OO/functional background, and I think less for non programmers. Maybe the syntax doesn't help and some other syntax would make things clearer, I don't know.

Other than that the approach is interesting and it made me think on some ways to define a stream processing language using mainly pattern matching.

Papers this week: 4

Papers so far: 33

Papers in queue: 76