Text editors wars

While reading about RMS leaving Emacs administration, I found some comments which came to my attention - specially the ones which talked about him using vi for text editing.

One thing I've never understood, and which I still find everywhere, was this 'emacs x vi' cold war, in which there is a lot of regilious arguments for defending or atacking each text editor. And the battle automatically exclude any other text editor, using readly made phrases like "your text editor cannot erase interlaced lines!" or "you cannot debug your programs using this!", confusing people into thinking that a text editor should really have this features.

In my little experience of almost 10 years using GNU/Linux, I've tried different text editors from console and X environments: vim, jed, joe, emacs, pico, nano, mcedit, gedit, kwrite, kate, notepad, and others. In all these cases, vim and emacs didn't even came closer to the best ones I've used: emacs was complicated even in the basic read/edit/save file loop, and vi needed too much keystrokes (which became even worse on a dvorak keyboard). Even trying hard, I never really found what was so special in these programs.

Perhaps their merit is not so much technical, but really archeological: these programs have a strong history in UNIX-like systems development, and many people have learned these systems through them, mixing the user's concept of the system with the presence of these applications. Maybe 30 years from now they'll be like COBOL: something nobody wants to use anymore, and just waiting to be left behind.

And that's it.


C vs. C++

Some time ago, Linus Torvalds have written some comments about the C++ language. It's hard to disagree with Linus in the point that C++ is a horrible language, but I find much harder to agree that C is a better language than C++ - even counting some "non-technical advantages" cited in the article.

C++ language is about structures and abstraction. Big projects, like GNU or Linux, tend to recreate that kind of structure over C when good programming techniques are involved, facing things like generic containers which cannot be type checked and inhability to use some very useful paradigms - just to start naming a few disadvantages. Reducing the benefits of the language to things like "nice object-oriented libraries" or the design of "object model crap" is just buying OO gook - which does not stand for C++ as what it can be, but for what it is when used by Java programmers. You can dislike virtually any language by bad programs and bad programmers which use it - even C - so this should not count when considering the merits of a language.

Talking about the real advantages of C++ over C, some essential points come to my mind:

  • You can define behaviours inside structures (or classes, if you prefer the sound of the name), and encaplusate what data will be accessible and what data will be opaque to the rest of the program. This allows you to define libraries for each needed feature/resource of your program (converting text encodings, manipulating strings, processing config files, etc) and combine them easily, reusing the code as needed. You can even use those advantages without objects, defining structures with only static methods;

  • Memory management is something you will never leave you wake at night again, thanks to references. You can always assume that a reference would exist and be initialized, even a reference being essentially a pointer. That gives you the advantages of passing mutable values to functions, without needing to know where you would need to deallocate them. Manual memory allocation/deallocation with new/delete can be so scarce that you will count in your fingers how many manual allocations you will need to have in your program - even for projects with size bigger than 20.000 LOC.

The critics used by Linus are actually very interesting, as they fit C as a glove. Let's talk about each one individually:

> infinite amounts of pain when they don't work (and anybody who tells me that STL and especially Boost are stable and portable is just so full of BS that it's not even funny)

This is actually real, STL and Boost give absurd template errors when you make coding mistakes, and they also may throw unwanted runtime exceptions if you use something incorrectly. The only real solution for the first is to fix the language (maybe 10 years from now) or to gain experience with C++ -- when you are a experienced C++ programmer, you learn to deduce what is wrong in your code based on the error messages. The second may be solved by documentation, or by searching at the library source code for "throw" clauses.

These two problems are bad design decisions made when the language got those features - unfortunatelly, they still remain. But the main point here - or, being more dramatic, the real important point - is that the first problem (template error messages) is a compile-time error, while the second (if you consider checked exceptions) can be solved at compile time.

In C, when your program does not work, the OS generates a nice "Segmentation fault" in the screen, which leaves you totally clueless, and with infinite piles of pain. I prefer that my programs just compile if they are right in the sense of "not crashing", and that the compiler - not the user - should verify that. The funniest thing is that you may have the same problems of C in C++, but only because you can program in C while using C++.

Finally, in my little experience of portability, I used both STL and Boost mainly in two operational systems (a little time in Windows XP, and the rest in tons of GNU/Linux distributions), and never had ANY problem compiling code in any of those systems, even using different compilers and compiler versions. Probably C++ compilers have evolved in the mean time, as they got mature projects.

> inefficient abstracted programming models where two years down the road you notice that some abstraction wasn't very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.

Sorry, but this argument is senseless: you can abstract anything incorretly in ANY language. And as far as my experience of programming goes, the pain of rewritting and application is much more severe in C than in any other language that can be called "high-level".

Other common argument against C++ that I hear often (and that I've already used, while programming in an embedded environment), is that it is not as efficient as C - mostly because of the abstractions involved in the langage libraries. That is a fact, actually: conventional C++ libraries usually are bigger than writing your own set of C function for dealing with your problem, and memory consumption will be much higher than allocating everything manually and statically.

However, it should be noted that good abstractions need good compilers, which can do good code optimizations. Recent GNU C++ compilers showed how a compiler can affect a language performance when new optimizations algorithms started being used. It is also important to mention that if you have a library written in C++, and you want to write the same thing in C, it would be bigger to codify, harder to maintain, and - specially if the library is some kind of container - unable to be checked for safety in the same extent C++ can.

For embedded environments, or things that need critical per-instruction optimization, the best choice is always C or assembly: you can garantee the compiler won't get in your way, neither the code will run in an undefined time length. But for application programming, the only excuse for not using C++ is using a better language - whose group, C does NOT belong.

And that's it.


Ring buffers and anonymous functions

Yesterday I was working on o project I've started sometime ago, for sending and receiving GSM audio over a socket (a client/server skype). I was implementing a ring buffer, to be used by the encoder/decoder, the audio library (PortAudio) and the socket reading/writing function, when I started to wonder how could I make it really concurrent, and easy to understand.

The first ring buffer was a C++ struct which simply copied the values inside and outside a pre-alocated memory area, in a thread safe way. The minor input / output for the GSM codec is 33 bytes / 160 shorts (320 bytes), so I started with a templated buffer which stored N values of type T. There was two functions for accessing the buffer:

1 void consume(value_type &buffer);
2 void produce(const value_type &buffer);

These functions consume/produce one value_type at each call, copying the value inside 'buffer' to the write position of the ring buffer, or copying the value on the read position of the ring buffer inside 'buffer'. But that was not good enought.

The second try involved a way to use the ring buffer directly, without copying buffers around. For that these functions arised:

1 void consume_ptr(const value_type **buf);
2 void consume_commit();
4 void produce_ptr(value_type **buf);
5 void produce_commit();

The functions ending with 'ptr(..)' have a special meaning attached: they do not copy buffers, but return the pointer to the actual position of the ring buffer - according to the requested operation. If you want to write to the ring buffer, 'produce_ptr(..)' returns a pointer to a buffer which must be filled with values. When done, user must call 'produce_commit()' so that pointer is forwarded and sleeping threads are awaked. The 'consume_ptr(..)' function does the same, but in opposite way: buffer must be read, and after that, 'consume_commit()' must be called.

But if I want to pass a function to the "produce" and "consume" of my ring buffer, which would be called receiving the needed pointer as argument, and whose return would automagically call the respective commit function?

You can make this in C++, with functional objects. You'll get an interface like this:

1 template <class T>
2 struct Ringbuffer
3 {
4 template <class F>
5 void produce(F f);
7 template <class F>
8 void consume(F f);
9 }

To use this feature, you'll need to make a code like this:

1 [...]
2 struct ring_read_abc
3 {
4 typedef Ringbuffer<short> R;
6 void operator()(R::value_type *v)
7 { /* send v via socket */ }
8 }
9 [...]
10 int tmp1, tmp2;
11 ring_read_abc::R rb;
12 rb.produce(ring_read_abc());
13 [...]

The problem about this approach is simple: looking at the above example, you can see that the operator() has no access to the context where produce was called. It cannot access tmp1 or tmp2, for instance. For that, you'll need to pass the arguments in the constructor of the ring_read_abc struct. This "glue code" can beat very hard concepts of "separation of concerns" when you want to nest two or more calls to functional objects, using the interface above.

The fourth interface goes beyond C++, and uses some features only present in functional languages. For that end, the Ringbuffer could have the same functions "produce" and "consume", each one receiving a function as argument, which made the interface like this (using Objective Caml syntax):

1 type buffer = int array
3 val consumer: (buffer -> unit) -> unit
4 val producer: (buffer -> unit) -> unit

The great advantage of such interface is that nested functions do not lose context. That means, you keep current bindings when you call another anonymous function. For instance:

1 [..]
2 consume
3 begin fun inp ->
4 produce
5 begin fun out ->
6 write inp out
7 end
8 end
9 [..]

In this case, 'consume' gets called and receives the first block "begin fun ... end", which is a function, as argument. It then calls this argument, that calls the 'produce' function, which will eventually call function 'write'.

The reader should notice that 'inp' in being passed by context, as any other value might have been passed. Functional objects, OTOH, do not have this flexibility.

As a last note, I would say programming as an everyday basis using a language (in this case, C++) bring you a great deal of confidence about it, but when you got some degree of knowledge about other forms of programming (in this case, functional programming), sometimes you want a bit more of what the language can provide you.



I am deeply thinking about starting to post in English here.

Although I may not have the best skills to do so, and the posts will have some more delay - and more typos - I'm still very inclined to embrace the wider audience this might bring to the blog. There are not much portuguese speaking geeks hanging around there that have similar views about programming languages (specially ML or Haskell variants), that want to research in the same fields or have the same topics of interest that I do, and there is even less geeks that want to exchange information about personal experiences in the area.

It's not very rewarding to open up this page every day and see little (or no) feedback. So, from now on, I will bring some posts translated into English, in an attempt to check if it's worth the effort. Some portuguese posts might still appear in a near future, but that's far from clear right now.

Viewers are welcome to leave comments.

See ya!


Momento pós-faculdade -- SML turn.

Um dos melhores pontos de se formar é o tempo livre que começa a sobrar no final de semana: as velhas leituras e pesquisas começam a caminhar novamente, os projetos começam a sair do zero, o blog começa a funcionar novamente, e por aí vai.

Algo que está no topo da fila, no momento, é aprender SML'97, especialmente usando o MLton como compilador. Comecei a estudar a linguagem, mas felizmente ela é similar o suficiente com O'Caml para que eu não precise me preocupar: algumas keywords e construções sintáticas novas, e grande parte do caminho estará andado. As vantagens são algumas: além da linguagem em si, e de algumas técnicas interessantes que são implementadas neste compilador, há também a possibilidade de estudar e contribuir com um compilador realmente open source. O desempenho do compilador também é outro fator importante para a escolha.

Há ainda uma penca de coisas na fila, mas grande parte depende das impressões que o MLton render. Assim que tiver um tempinho, volto aqui para continuar a lista.

E por enquanto, é isso.



Sacodindo a poeira do blog: encontrei uma entrevista com o criador da STL, falando sobre como concebeu a mesma, sobre paradigmas de programação, e sobre linguagens de programação em si.

Adiantando, as melhores partes são quando ele fala que:

a) concebeu a STL durante uma infecção bacteriana;
b) a orientação a objetos é um hoax;
c) de todas as linguagens que ele aprendeu, Java é uma das poucas que não acrescentou nada de novo ao que ele já sabia.

Ou seja: leitura altamente recomendada.

E é isso.


Programação (funcional) moderna.

Faz algum tempo comecei este post, mas não tinha tido a inspiração para terminar. O post anterior, sobre a comparação entre recursos de linguagens "funcionais" modernas e de linguagens orientadas a objeto serviu como estopim para terminar de escrever.

Para quem trabalha com programação usando variantes de ML e/ou Miranda, a fatídica pergunta "que linguagem(ns) você usa?" costuma ser um tiro no escuro. A resposta dificilmente é conhecida, na maioria dos casos: O'Caml e Haskell, mesmo sendo as mais conhecidas, são bem pouco utilizadas fora da academia.

Logo, você se depara com o clássico ponto de interrogação, que aparece no rosto da pessoa. Para esclarecer o que você está falando, a tentação inicial é classificá-las como linguagens funcionais, preferencialmente atribuindo também o adjetivo "modernas" - já mantendo no engatilhadas as respostas para quem diz "sim, sei, que nem LISP". Entretanto, classificar como uma linguagem funcional é muito comum: se você perguntar para qualquer programador de Haskell o que é a linguagem que ele usa, a resposta é essa. Alguns irão se extender um pouco mais, dizendo que é "fortemente tipada", e outros até vão falar em "lazy".

Entretanto, mesmo que o ponto principal de Haskell ou de O'Caml seja o paradigma funcional, o que faz delas linguagens flexíveis ou modulares não é o fato de serem funcionais, e sim a combinação dos recursos que elas oferecem. Tipagem forte e estática, tagged unions, pattern matching, inferência de tipos, type classes (no caso da segunda), funtores (no caso da primeira), tipagem algébrica, e outros tantos features que são igualmente importantes para a modularidade e aplicação real da linguagem.

Faz algum tempo que li um post não diretamente relacionado com o assunto em questão, mas que me levou a pensar sobre a sinergia nestas linguagens de programação, e enxergar as suas grandes vantagens como um resultado direto da combinação dos seus recursos. E no fundo, esta é a grande vantagem delas: a facilidade que o programador possui para se expressar através da combinação dos diversos recursos que elas oferecem.

Talvez esteja na hora de redefinir o adjetivo "moderno" na área das linguagens de programação, ou encontrar um novo para descrever esta gama de linguagens realmente modernas, que está anos-luz à frente das clássicas imperativas-OO.

E é isso.