Ir al contenido principal

This is my blog, more about me at marianoguerra.github.io

🦋 @marianoguerra.org 🐘 @marianoguerra@hachyderm.io 🐦 @warianoguerra

Papers of the LargeSpanOfTime I

Welp, some day the experiment had to end, I stopped reading 5 papers a week because some books arrived and I read those instead and also because I was busy at work.

But that doesn't mean I didn't read papers at all, so here's a list of the ones I did read.

Cuneiform: A Functional Language for Large Scale Scientific Data Analysis

Seems useful in practice, was expecting something else from the title.

The Stratosphere platform for big data analytics

I remember reading a paper from what later became Apache Flink that I liked a lot, I was looking for that one and I found this one instead (stratosphere became flink), it was an interesting overview, would like to know how much of that is still in flink.

Orleans: Distributed Virtual Actors for Programmability and Scalability

Really good paper, I like how it's written and the idea and implementation.

HyParView: a membership protocol for reliable gossip-based broadcast

Epidemic Broadcast Trees

This too reviewed together because they are like bread and butter, I love both of them, highly recommended.

Large-Scale Peer-to-Peer Autonomic Monitoring

I won't lie to you, I don't remember much about this one, but given the authors it must be good :)

Stream Processing with a Spreadsheet

Object Spreadsheets: A New Computational Model for End-User Development of Data-Centric Web Applications

I was looking for ideas and inspiration when I read these two, I liked both, Object Spreadsheets being the most interesting aproach.

A Layered Grammar of Graphics

Great paper, on my top list, maybe because I love the topic :)

Virtual Time and Global States of Distributed Systems

A must read if interested in vector clocks, the non math parts are good, I don't enjoy reading theormes a lot (not their fault).

Papers this looong week: 10

Papers so far: 43

Papers in queue: don't want to count anymore

Improving Official Erlang Documentation

Many times I've heard people complaining about different aspects of the Official Erlang documentation, one thing that I find interesting is the fact that the Erlang documentation is really complete and detailed, so I decided to dedicate some time to other parts, to get familiar with it I decided to start with an "easy" one, it's presentation.

So I downloaded erlang/otp:

git clone https://github.com/erlang/otp.git

And did a build:

# to avoid having dates formated in your local format
export LC_ALL="en_US.utf-8"
cd otp
./otp_build setup
make docs

Then I installed the result in another folder to see the result:

mkdir ../erl-docs
make release_docs RELEASE_ROOT=../erl-docs

And served them to be able to navidate them:

cd ../erl-docs
python3 -m http.server

If you want to give it a try you need to install the following deps on debian based systems:

sudo apt install build-essential fop xsltproc autoconf libncurses5-dev

With the docs available I started looking around, the main files to modify are:

lib/erl_docgen/priv/css/otp_doc.css

The stylesheet for the docs

lib/erl_docgen/priv/xsl/db_html.xsl

An XSLT file to transform xml docs into html

The problem I found at first was that to see the results of my changes to db_html.xsl I had to do a clean and build from scratch, which involved recompiling erlang itself, taking a lot of time.

Later I found a way to only build the docs again by forcing a rebuild:

make -B docs

But this still involves building the pdf files which is the part that takes the most time, I haven't found a target that will only build the html files, if you know how or want to try to add it in the make file it would be great.

With this knowledge I started improving the docs, I will cover the main things I changed.

You can see all my chages in the improve-docs-style branch.

Small styling changes

  • Don't use full black and white

  • Set font to sans-serif

  • Use mono as code font

  • Improve link colors

  • Improve title and description markup on landing page

  • Update menu icons (the folder and document icons)

  • Improve panel and horizontal separator styles

  • Align left panel's links to the left

Improve code box color, border and spacing

/galleries/misc/otp-old-2.png

Old Code Examples

/galleries/misc/otp-new-2.png

New Code Examples

Improve warning and info boxes' color, border and spacing

/galleries/misc/otp-old-3.png

Old Warning Dialog

/galleries/misc/otp-new-3.png

New Warning Dialog

/galleries/misc/otp-old-4.png

Old Info Dialog

/galleries/misc/otp-new-4.png

New Info Dialog

Logo Improvements

  • Remove drop shadows from logo

  • Center Erlang logo on left panel

  • Erlang logo is a link to the docs' main page

  • Put section description after logo and before links in left panel

/galleries/misc/otp-old-1.png

Old Landing Page

/galleries/misc/otp-new-1.png

New Landing Page

Semantic Improvements

  • Use title tags for titles

  • Remove usage of <br/> and empty <p></p> to add vertical spacing

  • Use lists for link lists

  • Title case section titles instead of uppercase

  • Add semantic markup and classes to section titles and bodies

  • Add classes to all generated markup

    • The ones I couldn't figure out a semantic class I added a generic one to help people spot them in the xsl document by inspecting the generated files

  • Clicable titles for standard sections with anchors for better linking

Improve table styling

/galleries/misc/otp-old-5.png

Old Tables

/galleries/misc/otp-new-5.png

New Tables

Improve applications page

/galleries/misc/otp-old-7.png

Old Applications List

/galleries/misc/otp-new-7.png

New Applications List

Improve modules page

/galleries/misc/otp-old-8.png

Old Modules List

/galleries/misc/otp-new-8.png

New Modules List

Add "progressive enhanced" syntax highlighting

At the bottom of the page there's a javascript file loaded, if successful it will load the syntax highlighter module and css and then style all the code blocks in the page, if it fails to load, is blocked or no js is enabled then the code blocks will have a default styling provided by CSS.

The markup was not modified in any way to add this feature.

Make code tokens easier to differentiate from standard text

The previous style for inline code was a really light italic font, I changed it to monospace but it was hard to distinguish, so I got some inspiration from slack and surrounded the inline code words in a light box to make them stand out.

Indent Exports and Data Types' section bodies

/galleries/misc/otp-old-6.png

Old Data Types and Exports Sections

/galleries/misc/otp-new-6.png

New Data Types and Exports Sections

This is all for now, I have some other ideas for future improvements but they involve changes to the documentation so I will submit them separatedly.

If you have any feedback please let me know!

Software que no falla

Reproduzco acá un post que hice en facebook después de ver la siguiente transcripción:

/galleries/misc/software-no-falla.jpg

Avisenle al señor Tonelli que el mismo día que el decía eso la agencia espacial europea perdió contacto con una sonda que mando a marte, que estuvo desarrollando por los últimos 7 anios, el proyecto salio 870 millones de euros y tiene los niveles de control de calidad mas altos de cualquier industria.

Un día después de eso, durante mas de dos horas servicios como twitter, netflix, github, paypal estuvieron fuera de servicio porque alguien hackeo webcams y otros dispositivos "inteligentes" y los uso para realizar un ataque de denegación de servicio contra un servicio que traduce lo que escribís en la barra de direcciones de tu navegador a direcciones que las computadoras pueden entender.

El que dice que el software no va a fallar es un irresponsable y no puede tener ninguna responsabilidad legislando sobre siquiera una linea de código.

Luego comencé a agregar los siguientes comentarios:

1) Mas noticias del día, se encontró hoy en el sistema operativo que van a usar las maquinas de voto electrónico un error que permite a cualquier persona obtener control total sobre el sistema, se que no lo van a leer pero acá esta:

“Most serious” Linux privilege-escalation bug ever is under active exploit

2) Hoy se informo que una empresa que distribuye certificados SSL (lo que pone el candadito verde en la dirección de tu banco y hace que sea una conexión segura, que también se usa para la transmisión de los resultados de las maquinas de voto al servidor central) permitía a personas obtener certificados para dominios que no eran de las personas que los solicitaban.

Incident Report - OCR

3) Algunos "divertidos" de la historia: Stanislav Yevgráfovich Petrov (Станислав Евграфович Петров en ruso, nacido en 9 de septiembre de 1939) es un teniente coronel retirado del ejército soviético durante la Guerra Fría. Es recordado por haber identificado correctamente una alerta de ataque con misiles como una falsa alarma en 1983, por lo que evitó lo que podía haber escalado en una guerra nuclear entre la Unión Soviética y los Estados Unidos.

4) Uno de 1998: La Mars Climate Orbiter se destruyó debido a un error de navegación, consistente en que el equipo de control en la Tierra hacía uso del Sistema Anglosajón de Unidades para calcular los parámetros de inserción y envió los datos a la nave, que realizaba los cálculos con el sistema métrico decimal. Así, cada encendido de los motores habría modificado la velocidad de la sonda de una forma no prevista y tras meses de vuelo el error se había ido acumulando.

5) En 2003 50 millones de personas se quedaron sin electricidad en Estados Unidos y Canada por un error de software: https://en.wikipedia.org/wiki/Northeast_blackout_of_2003

6) La Therac-25 fue una máquina de radioterapia producida por AECL, sucesora de los modelos Therac-6 y Therac-20 (las unidades anteriores fueron producidas en asociación con CGR). El aparato estuvo comprometido en al menos seis accidentes entre 1985 y 1987, en los que varios pacientes recibieron sobredosis de radiación. Tres de los pacientes murieron como consecuencia directa. Estos accidentes pusieron en duda la fiabilidad del control por software de sistemas de seguridad crítica, convirtiéndose en caso de estudio en la informática médica y en la ingeniería de software.

7) En 1995 un cohete (Ariane 5) que costo 7 billones de dolares de desarrollo y llevaba una carga valuada en 500 millones de dolares exploto porque se uso un numero "muy chico" para mantener la velocidad horizontal, esto resulto en la explosión del cohete.

8) Knight Capital perdió 440 millones de dolares en 45 minutos y se fue a la quiebra por un error de software que vendio acciones a precio equivocado.

9) En 2004 el sistema de trafico aéreo de Los Ángeles dejo de funcionar porque usaban un contador "muy chico", lo divertido es que el sistema de respaldo dejo de funcionar a los minutos de ser encendido.

10) En 1979 una planta nuclear en estados unidos "sufrió una fusión parcial del núcleo del reactor" causa: "La válvula debía cerrarse al disminuir la presión, aunque por un fallo no lo hizo. Las señales que llegaban al operador no indicaron que la válvula seguía abierta, aunque debía haberlo mostrado."

https://es.wikipedia.org/wiki/Accidente_de_Three_Mile_Island

11) Otras veces las causas son políticas "...fallas en la comunicación... dieron lugar a una decisión de lanzar 51-L basada en información incompleta y algunas veces engañosa, un conflicto entre los datos de ingeniería y los juicios de gestión, y una estructura de dirección de la NASA que permitió problemas internos de seguridad de vuelo para eludir las claves de traslado del transbordador."

https://es.wikipedia.org/wiki/Siniestro_del_transbordador_espacial_Challenger

This Week in WebAssembly III

This Week in WebAssembly III

Binaryen

binaryen repository

Design

design repository

Spec

spec repository

The most important "change" is that a PR for the stack machine semantics was opened in PR #323, but still not merged.

Website

webassembly.github.io repository

Resources

This Week in WebAssembly II

Second update on #webassembly

Binaryen

binaryen repository

Design

design repository

Spec

spec repository

No Changes

Website

webassembly.github.io repository

Resources

This Week in WebAssembly I

(Hopefuly) weekly update on WebAssembly and WebAssembly related projects

Binaryen

binaryen repository

Design

design repository

Spec

spec repository

Website

webassembly.github.io repository

No changes

Resources

Ricardo Forth: a Forth implemented in C, JS, WebAssembly and compiled from C to asm.js and WebAssembly

It comes a time in the life of everyone when you implement a Forth.

The time has come for me.

Presenting Ricardo Forth:

A Forth dialect implemented in C, Javascript, WebAssembly and compiled from C to asm.js and WebAssembly.

This project is based on the 1992 IOCCC entry buzzard.2 (design notes: buzzard.2.design), prettified and then compiled to:

Also reimplemented by translating the C code into Javascript and WebAssebly.

Go check it out if you are curious about asmjs, WebAssembly, Forth or Emscripten/Binaryen.

Papers of the Week VII

Because nothing lasts forever and after a week half traveling and a busy one I managed to read 4 papers this week.

The first one was interesting but comes from an area I will describe as "let's bend relational databases to fit Event Stream Processing", which is not bad per se but has things like joins and being able to remember past events that make its scalability (at least in terms of memory) quite hard, also it never discuses distribution, which is ok for the field but not what I'm looking for.

The interesting part about this one is the part where it introduces Visibly Pushdown Languages something that looks really interesting but I couldn't find an introduction for mere mortals, the descriptions are really dense an mathematical, which is ok but hard to learn for outsiders like me.

Another interesting point is the fact that it uses the XML Schema to optimize the generated VPA (Visibly Pushdown Automata) and that the implementation not only applies to XML but to any nested semistructured data.

The review of the next one will seem conflicting with my previous reviews, but this one had too much enfasis on the low level implementation details, not novel things and optimizations, just a lot of details, like the guys found the implementation really cool and wanted to share it with the world. Not a bad thing per se, but in this batch I was looking for abstractions, optimizations and distribution characteristics of stream processing, better if focused on distributed systems, and this one talked mainly about the DSL they build that compiles to C. It also sorts the streams, does multiple passes over the data, does lookahead in the stream and does a kind of "micro batches" which isn't what I was looking for.

The last one, I found the approach interesting, they seemed to try to push the purity of the approach (everything is a regular expression) which may have end up with a nice model (a thing I like) but by reading the code it doesn't seem to be really clear, at least for a OO/functional background, and I think less for non programmers. Maybe the syntax doesn't help and some other syntax would make things clearer, I don't know.

Other than that the approach is interesting and it made me think on some ways to define a stream processing language using mainly pattern matching.

Papers this week: 4

Papers so far: 33

Papers in queue: 76

How to build Riak TS (Time Series Database) from Source

To build riak ts we need some basic build tools installed, like compilers and tools.

On ubuntu/debian an derivatives:

sudo apt-get update
sudo apt-get install build-essential autoconf git libncurses5-dev libssl-dev libpam0g-dev

On RHEL, Centos, Oracle Linux and derivatives:

sudo yum update -y
sudo yum groupinstall "Development Tools" -y
sudo yum install openssl-devel ncurses-devel git autoconf pam-devel -y

A quick description of each so you can map to your OS:

  • build-essential: a group of tools to build stuff (duh!)

  • autoconf: needed to build basho's erlang OTP version

  • git: to fetch repos

  • libcurses and libssl: to have curses and ssl support on erlang

  • libpam0g-dev: required to compile a riak module (canola)

    • not sure about the RHEL equivalent, try pam-devel

Now clone the riak repo:

git clone https://github.com/basho/riak.git
cd riak

Checkout the Riak TS tag:

git checkout riak_ts-1.3.0

Download and install kerl to build the correct erlang OTP version:

mkdir -p ~/bin
wget https://raw.githubusercontent.com/kerl/kerl/master/kerl -O ~/bin/kerl
chmod u+x ~/bin/kerl
export PATH=$PATH:$HOME/bin

Build OTP_R16B02_basho10 erlang version (notice that this won't interfere with your local erlang installation, see kerl readme for details):

kerl build git git://github.com/basho/otp.git OTP_R16B02_basho10 R16B02-basho10
mkdir -p ~/soft/erlang-releases/R16B02-basho10
kerl install R16B02-basho10 ~/soft/erlang-releases/R16B02-basho10
. ~/soft/erlang-releases/R16B02-basho10/activate
export PATH=$HOME/soft/erlang-releases/R16B02-basho10/bin:$PATH

Now build Riak TS:

make locked-deps
make rel

And run it:

cd rel/riak
./bin/riak console

Papers of the Week VI

Better late than never (even when I read all the papers last week) here is the sixth installment of Papers of the Week.

Starting next week I will try to write the reviews after I read the papers and not almost one week after when my memories are fuzzy :)

The fist one describes an implementation of out of order processing using punctuation, interesting in that it "applies" the concept of punctuation to building a streaming system and analyzes the result.

This one describes an implementation of a storage engine using LSM Trees and a compression technique.

You can read an overview of the next paper and find the link to it at acolyer's paper of the day: Holistic Configuration Management at Facebook, I copy the first paragraph here:

This paper gives a comprehensive description of the use cases, design,
implementation, and usage statistics of a suite of tools that manage
Facebook’s configuration end-to-end, including the frontend products,
backend systems, and mobile apps.

It's a good overview of tools and techniques used to scale and standardize configuration management and how to avoid problems introduced by sloppy configuration management.

The next one is my favorite of the week, it defines a baseline by implementing solutions from other papers that introduce some parallelization strategy by implementing them in a simple single threaded way and benchmarking it against other solutions, then defined a "metric" that describes how many cores are required to match the single thread implementation, as many sites would tell you "the result will amaze you".

The last one for this week surprisingly brought me to the CRDT/Lasp/@cmeik land, when the title didn't seemed to imply that, the crazy fact is that I saw a talk about this paper at RICON 2015 and I didn't remembered the title :)

Some parts where hard for me since it's the first paper I read about CRDTs so I don't have the vocabulary and basic theory in place but it made me think on some interesting applications on the IoT and monitoring spaces.

Papers this week: 5

Papers so far: 29

Papers in queue: 82