Note: This is an experimental site, if you have a feed you would like to add or you would like your feed to be removed please contact me.

Planet 5

October 10, 2012

jra's thoughts

Sugru

I had a great first Sugru experience! I scratched my finger on the way to work on my unicycle. I thought, dammit, I’m gonna fix that! So I bought some from my phone on the train. Cool!

A few days later, it arrived in the post. I was having a hard time explaining why I was so excited about it to my wife. So I sat down, cut open my first packet and started fixing. My wife got interested, fast! She asked, would this stick to the door handle to make a bumper? I said, go try! And off she ran with my Sugru, leaving me grinning and cutting open a new packet…

Thanks, Sugru, for our new door bumper and unicycle seat bolt covers! You made two new happy fixers.

Correction: Three new fixers, including Elio, age 2 and a half:

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="480" src="http://www.youtube.com/embed/oGW0qZUuowA?rel=0" width="640"></iframe>

by jra at October 10, 2012 04:36 PM

October 03, 2012

jra's thoughts

Two more stories

Here’s two stories that worked their charm on a not-very-sleepy little boy, turning him into a sleepy one.

Bep and the Giant Pumpkin: It is October, the official start of pumpkin season in our house. That means it is time to start talking about pumpkins and where we get them. So mommy, daddy, Elio and Emma got in Bep and went for a drive. They drove up up up to the top of Lausanne and then up up up to Mont sur Lausanne, then up up up more to the little self-service vegetable stand above it. Everyone went in and chose their pumpkins one by one. Emma got a little pumpkin because she needed to hold it between her feet in her maxi-cosi. Elio got a bigger pumpkin because he’s strong (Elio e forte!). Mami chose a good pumpkin for soup because she makes the best pumpkin soup. And daddy chose one to make a yummy pumpkin curry. Daddy was about to pay and go, but Elio asked, “What about Bep?” So they looked but none of the pumpkins were big enough for Bep. They went out back and found a perfect pumpkin, 1 meter across, 3.14 meters around and weighing 200 kg! They went to get Bep and asked, “can you carry it?” Bep said “beep beep vroom vroom, no problem” and drove the whole family home, each with their pumpkin on their lap (except mami, who was driving) and Bep with his pumpkin right in the middle.

The Little Red Robot: Once upon a time, there was a little red robot. (Elio was wearing his red robot pj’s.) This little red robot could do everything the master programmer taught him. He could dress himself, go poo poo, feed himself, read books, etc etc. He even knew how to initiate the food processing unit decontamination procedure (teeth brushing) and the light delumination protocol (turning off the bathroom light by head butting it). But each night, when the little red robot went into sleep mode, something magical happened. He got to do things he never did in the day. He ran programs for flying on clouds, climbing in trees, (around this point Elio interrupted and asked that the story should include a “motoschlitta”: snowmobile), and ride lunar snowmobiles across the moon through craters and over the horizon, far from the robot base. That night, the little red robot ran the snow mobile program in his processing unit all night. In the morning when he woke up, he went to Central Dispatch and found that his job that day was to go drive lunar snowmobiles for real! And so he did.

by jra at October 03, 2012 07:07 PM

October 02, 2012

Sonia Hamilton

Golang – setup and producing .debs

Here’s the setup I use for compiling Go binaries, as well as for writing .deb’s to package them and markdown for README’s – notes for me.

Install build pre-requisites:

sudo aptitude install gcc libc6-dev libc6-dev-i386 make \
  markdown build-essential debhelper dh-make fakeroot devscripts

Install Go from source (so can cross-compile). Download from Go Downloads eg go1.0.2.src.tar.gz

sudo tar -C /usr/local -xzf go1.0.2.src.tar.gz ; cd /usr/local/go/src
sudo GOARCH=amd64 ./all.bash ; sudo GOARCH=386 ./all.bash

Add to /etc/profile, source or re-login:

export GOROOT="/usr/local/go"
export PATH="$GOROOT/bin:$PATH"

by Sonia Hamilton at October 02, 2012 11:29 PM

embrace change

Happy to announce Tideland CGL Release 2.1.0


I'm happy to announce the new Release 2.1.0 of the Tideland Common Go Library. Beside some fixes, especially for the Redis client,  it contains new packages for buildings Atom and RSS clients and to supervise goroutines almost like in Erlang/OTP.

General


  • Changed import path prefixes from code.google.com/p/tcgl to cgl.tideland.biz

Cells


  • Added EventCollector behavior

Net


  • Added new top-level package net for network related packages

Net/Atom


  • Added package net/atom for reading atom feeds

Net/RSS


  • Added package net/rss for reading rss feeds

Redis


  • Changed connection pooling and error handling
  • Now errors are better detected and returned and the pool is more flexible and faster
  • Removed deadlock error when number of used connections is larger then pool size

Supervisor


  • Added new package supervisor inspired by Erlang/OTP supervisors
  • Supervisors run goroutines and other supervisors and restart them after errors or panics
  • Hierarchical supervisor trees are possible
  • Depending on strategy and restart frequency all supervised goroutines may be restarted or stopped after an error
  • As Go has not the monitoring/killing abilities of Erlang the goroutines have to implement a defined signature and stop after a signal of their supervisor tells them to do

Web


  • Changed content type of gob-encoded data to application/vnd.tideland.gob

As usual you find it at http://cgl.tideland.biz.

by Frank Müller (noreply@blogger.com) at October 02, 2012 08:44 PM

September 24, 2012

Savoury Morsels

rjson – readable JSON

[Edited: after some feedback, I have renamed this project to rjson (it's really not at all Go-specific) and changed the specification so that unquoted strings are accepted only as object keys]

JSON is a fine encoding. It has a very simple data model; it’s easy to understand and to write parsers for.  But personally, I find it a bit awkward to read and to edit. All those quotes add noise, and I’m always forgetting to remove the final comma at the end of an array or an object. I’m not the only one either. I’ve chatted to people about why they are using YAML, and the reason is usually not because of the zillion features that YAML offers, but because the format is more human-friendly to edit.

A while ago I had an idea of how I might help this, and yesterday I had a free day to do it. I forked the Go json package to make rjson. It uses an idea taken from the Go syntax rules to make commas optional, and quotes become optional around object keys that look like identifiers. The rjson syntax rules are summarised in the rjson package documentation.

To make it easy to experiment with and use this format, I created a rjson command that can read and write both formats.

Here is a transcript showing how the command can be used:

% cat config.json
{
  "ui" : {
    "global" : {
      "ui-type" : "default",
      "show-indicator" : true
    }
  }
}% rjson < config.json
{
	ui: {
		global: {
			show-indicator: true
			ui-type: "default"
		}
	}
}
% rjson -indent '' < config.json
{ui:{global:{show-indicator:true,ui-type:"default"}}}
% rjson -indent '' -j < config.json
{"ui":{"global":{"show-indicator":true,"ui-type":"default"}}}
%

You might notice that the compact version of the rjson format is smaller than the equivalent JSON. On a random selection of JSON files (all that I could find in my home directory), I measured that they were about 7% smaller on average when encoded with gson -indent ”. This was a nice bonus that I had not considered.

To use rjson, you’ll need a working Go installation; then you can fetch and install the command into your go tree thus:

go get launchpad.net/rjson/cmd/rjson

Enjoy!


by rogpeppe at September 24, 2012 10:20 AM

Stan Steel

DarkEdit (for lack of a better name)

I've resurrected an idea I had awhile back regarding a simplified development environment.  I've gotten tired of Eclipse and large IDEs over the years.  The obvious benefit is that it can be built to suit my needs.  Here is a picture of the first display prototype:

In the above prototype, the editor is in very much like a traditional editor.  This isn't what I want, but I think it is important to meet common expectations.  
What follows is a screenshot of my current development effort.  I think, there is enough novelty in this new vision to warrant continuing the development effort.


by Stan Steel (steel@kryas.com) at September 24, 2012 05:14 AM

September 23, 2012

Command Center

Thank you Apple


Some days, things just don't work out. Or don't work.

Earlier

I wanted to upgrade (their term, not mine) my iMac from Snow Leopard (10.6) to Lion (10.7). I even had the little USB stick version of the installer, to make it easy. But after spending some time attempting the installation, the Lion installer "app" failed, complaining about SMART errors on the disk.

Disk Utility indeed reported there were SMART errors, and that the disk hardware needed to be replaced. An ugly start.

The good news is that in some places, including where I live, Apple will do a house call for service, so I didn't have to haul the computer to an Apple store on public transit.

Thank you Apple.

I called them, scheduled the service for a few days later, and as instructed by Apple (I hardly needed prompting) prepped a backup using Time Machine.

The day before the repairman was to come to give me a new disk, I made sure the system was fully backed up, for security reasons started a complete erasure of the bad disk (using Disk Utility in target mode from another machine, about which more later), and went to bed.

The day

When I got up, I checked that the disk had been erased and headed off to work. As I left the apartment, the ceiling lights in the entryway flickered and then went out: a new bulb was needed. On the way out of the building, I asked the doorman for a replacement bulb. He offered just to replace it for us. We have a good doorman.

Once at work, things were normal until my cell phone rang about 2pm. It was the Apple repairman, Twinkletoes (some names and details have been changed), calling to tell me he'd be at my place within the hour. Actually, he wasn't an Apple employee, but a contractor working for Unisys, a name I hadn't heard in a long time. (Twinkletoes was a name I hadn't heard for a while either, but that's another story.) At least here, Apple uses Unisys contractors to do their house calls.

So I headed home, arriving before Twinkletoes. At the front door, the doorman stopped me. He reported that the problem with the lights was not the bulb, but the wiring. He'd called in an electrician, who had found a problem in the breaker box and fixed it. Everything was good now.

When I got up to the apartment, I found chaos: the cleaners were mid-job, with carpets rolled up, vacuum cleaners running, and general craziness. Not conducive to work. So I went back down to the lobby with my laptop and sat on the couch, surfing on the free WiFi from the café next door, and waited for Twinkletoes.

Half an hour later, he arrived and we returned to the apartment. The cleaners were still there but the chaos level had dropped and it wasn't too hard to work around them. I saw what the inside of an iMac looks like as Twinkletoes swapped out the drive. By the time he was done, the cleaners had left and things had settled down.

The innards of my 27" iMac


I had assumed that the replacement drive would come with an installed operating system, but I assumed wrong. (When you assume, you put plum paste on your ass.) I had a Snow Leopard installation DVD, but I was worried: it had failed to work for me a few days earlier when I wanted to boot from it to run fsck on the broken drive. Twinkletoes noticed it had a scratch. I needed another way to boot the machine.

It had surprised me when Lion came out that the installation was done by an "app", not as a bootable image. This is an unnecessary complication for those of us that need to maintain machines. Earlier, when updating a different machine, I had learned how painful this could be when the installation app destroyed the boot sector and I needed to reinstall Snow Leopard from DVD, and then upgrade that to a version of the system recent enough to run the Lion installer app. As will become apparent, had Lion come as a bootable image things might have gone more smoothly.

Thank you Apple.

[Note added in post: Several people have told me there's a bootable image inside the installer. I forgot to mention that I knew that, and there wasn't. For some reason, the version on the USB stick I have looks different from the downloaded one I checked out a day or two later, and even Twinkletoes couldn't figure out how to unpack it. Weird.]

Twinkletoes had an OS image he was willing to let me copy, but I needed to make a bootable drive from it. I had no sufficiently large USB stick—you need a 4GB one you can wipe. However I did have a free, big enough CompactFlash card and a USB reader, so that should do, right? Twinkletoes was unsure but believed it would.

Using my laptop, I used Disk Utility to create a bootable image on the CF card from Twinkletoes's disk image. We were ready.

Plug in the machine, push down the Option key, power on.

Nothing.

Turn on the light.

Nothing.

No power.

The cleaners must have tripped a breaker.

I went to the breaker box and found that all the breakers looked OK. We now had a mystery, because the cleaners had had lights on and were using electric appliances—I saw a vacuum cleaner running—but now there was no power. Was the power off to the building? No: the lights still worked in the kitchen and the oven clock was lit. I called the doorman and asked him to get the electrician back as soon as possible and then, with a little portable lamp, went looking around the apartment for a working socket. I found one, again in the kitchen. The iMac was going to travel after all, if not as far as downtown.

The machine was moved, plugged in, option-key-downed, and powered on. I selected the CF card to boot from, waited 15 minutes for the installation to come up, only to have the boot fail. CF cards don't work after all, although the diagnosis of failure is a bit tardy and uninformative.

Thank you Apple.

Next idea. My old laptop has FireWire so we could bring the disk up using target mode and then run the installer on the laptop to install Lion on the iMac.

We did the target mode dance and connected to the newly installed drive, then ran Disk Utility on the laptop to format the drive. Things were starting to look better.

Next, we put the Lion installer stick into the laptop, which was running a recent version of Snow Leopard.

Failure again. This time the problem is that the laptop, all of about four years old, is too old to run Lion. It's got a Core Duo, not a Core 2 Duo, and Lion won't run on that hardware. Even though Lion doesn't need to run, only the Lion installer needs to run, the system refuses to help. My other laptop is new enough to run the installer, but it doesn't have FireWire so it can't do target mode.

Thank you Apple. Your aggressive push to retire old technology hurts sometimes, you know? Actually, more than sometimes, but let's stay on topic.

Twinkletoes has to leave—he's been on the job for several hours now—but graciously lends me a USB boot drive he has, asking me to return it by post when I'm done. I thank him profusely and send him away before he is drawn in any deeper.

Using his boot drive, I was able to bring up the iMac and use the Lion installer stick to get the system to a clean install state. Finally, a computer, although of course all my personal data is over on the backup.

When a new OS X installation comes up, it presents the option of "migrating" data from an existing system, including from a Time Machine backup. So I went for that option and connected the external drive with the Time Machine backup on it.

The Migration Assistant presented a list of disks to migrate from. A list of one: the main drive in the machine. It didn't give me the option of using the Time Machine backup.

Thank you Apple. You told me to save my machine this way but then I can't use this backup to recover.

I called Apple on my cell phone (there's still no power in the room with the land line's wireless base station) and explained the situation. The sympathetic but ultimately unhelpful person on the phone said it should work (of course!) and that I should run Software Update and get everything up to the latest version. He reported that there were problems with the Migration Assistant in early versions of the Lion OS, and my copy of the installer was pretty early.

I started the upgrade process, which would take a couple of hours, and took my laptop back down to the lobby for some free WiFi to kill time. But it's now evening, the café is closed, and there is no WiFi. Naturally.

Back to the apartment, grab a book, return to the lobby to wait for the electrician.

An hour or so later, the electrician arrived and we returned to the apartment to see what was wrong. It was easy to diagnose. He had made a mistake in the fix, in fact a mistake related to what was causing the original problem. The breaker box has a silly design that makes it too easy to break a connection when working in the box, and that's what had happened. So it was easy to fix and easy to verify that it was fixed, but also easy to understand why it had happened. No excuses, but problem solved and power was now restored.

The computer was still upgrading but nearly done, so a few minutes later I got to try migrating again. Same result, naturally, and another call to Apple and this time little more than an apology. The unsatisfactory solution: do a clean installation and manually restore what's important from the Time Machine backup.

Thank you Apple.

It was fairly straightforward, if slow, to restore my personal files from the home directory on the backup, but the situation for installed software was dire. Restoring an installed program, either using the ludicrous Time Machine UI or copying the files by hand, is insufficient in most cases to bring back the program because you also need manifests and keys and receipts and whatnot. As a result, things such as iWork (Keynote etc.) and Aperture wouldn't run. I could copy every piece of data I could find but the apps refused to let me run them. Despite many attempts digging far too deep into the system, I could not get the right pieces back from the Time Machine backup. Worse, the failure modes were appalling: crashes, strange display states, inexplicable non-workiness. A frustating mess, but structured perfectly to belong on this day.

For peculiar reasons I didn't have the installation disks for everything handy, so these (expensive!) programs were just gone, even though I had backed up everything as instructed.

Thank you Apple.

I did have some installation disks, so for instance I was able to restore Lightroom and Photoshop, but then of course I needed to wait for huge updates to download even though the data needed was already sitting on the backup drive.

Back on the phone for the other stuff. Because I could prove that I had paid for the software, Apple agreed to send me fresh installation disks for everything of theirs but Aperture, but that would take time. In fact, it took almost a month for the iWork DVD to arrive, which is unacceptably long. I even needed to call twice to remind them before the disks were shipped.

The Aperture story was more complicated. After a marathon debugging session I managed to get it to start but then it needed the install key to let me do anything. I didn't have the disk, so I didn't know the key. Now, Aperture is from part of the company called Pro Tools or something like that, and they have a different way of working. I needed to contact them separately to get Aperture back. It's important to understand I hadn't lost my digital images. They were backed up multiple times, including in the network, on the Time Machine backup, and also on an external drive using the separate "vault" mechanism that is one of the best features of Aperture.

I reached the Aperture people on the phone and after a condensed version of the story convinced them I needed an install key (serial number) to run the version of Aperture I'd copied from the Time Machine backup. I was berated by the person on the phone: Time Machine is not suitable for backing up Aperture databases. (What? Your own company's backup solution doesn't know how to back up? Thank you Apple.) After a couple more rounds of abuse, I convinced the person on the phone that a) I was backing up my database as I should, using an Aperture vault and b) it wasn't the database that was the problem, but the program. I was again told that wasn't a suitable way to back up (again, What?), at which point I surrendered and just begged for an installation key, which was provided, and I could again run Aperture. This was the only time in the story where the people I was interacting with were not at least sympathetic to my situation. I guess Pro is a synonym for unfriendly.

Thank you Apple.

There's much more to the story. It took weeks to get everything working again properly. The complete failure of Time Machine to back up my computer's state properly was shocking to me. After this fiasco, I learned about the Lion Recovery App, which everyone who uses Macs should know about, but was not introduced until well after Lion rolled out with its preposterous not-bootable installation setup. The amount of data I already had on my backup disk but that needed to be copied from the net again was laughable. And there were total mysteries, like GMail hanging forever for the first day or so, a problem that may be unrelated or may just be the way life was this day.

But, well after midnight, worn out, beat up, tired, but with electricity restored and a machine that had a little life in it again, I powered down, took the machine back to my office and started to get ready for bed. Rest was needed and I had had enough of technology for one day.

One more thing

Oh yes, one more thing. There's always one more thing in our technological world.

I walked into the bathroom for my evening ablutions only to have the toilet seat come off completely in my hand.

Just because you started it all, even for this,

Thank you Apple.

by rob (noreply@blogger.com) at September 23, 2012 03:33 AM

September 21, 2012

Adam Langley

CRIME

Last year I happened to worry on the SPDY mailing list about whether sensitive information could be obtained via SPDY's use of zlib for compressing headers. Sadly, I never got the time to follow up and find out whether it was a viable attack. Thankfully there exist security researchers who, independently, wondered the same thing and did the work for me! Today Duong and Rizzo presented that work at ekoparty 2012.

They were also kind enough to let Firefox and ourselves know ahead of time so that we could develop and push security fixes before the public presentation. In order to explain what we did, let's start by looking at how SPDY compressed headers:

(This is inline SVG, if you can't see it, check here.)

<style>.ref { opacity: 0.3; } .ref:hover { opacity: 1.0; stroke-width: 2px; }</style> ? ? ? ? ? ? ? ? : h o s t ? ? ? ? w w w . g o o g l e . c o m ? ? ? ? : m e t h o d ? ? ? ? G E T ? ? ? ? : p a t h ? ? ? ? / ? ? ? ? : s c h e m e ? ? ? ? h t t p s ? ? ? ? : v e r s i o n ? ? ? ? H T T P / 1 . 1 ? ? ? ? a c c e p t ? ? ? ? t e x t / h t m l , a p p l i c a t i o n / x h t m l + x m l , a p p l i c a t i o n / x m l ; q = 0 . 9 , * / * ; q = 0 . 8 ? ? ? ? a c c e p t - c h a r s e t ? ? ? ? I S O - 8 8 5 9 - 1 , u t f - 8 ; q = 0 . 7 , * ; q = 0 . 3 ? ? ? ? a c c e p t - e n c o d i n g ? ? ? ? g z i p , d e f l a t e , s d c h ? ? ? ? a c c e p t - l a n g u a g e ? ? ? ? e n - U S , e n ; q = 0 . 8 ? ? ? ? c o o k i e ? ? ? ? P R E F = I D = 7 0 c b b f e 7 e e 8 8 a 5 e 2 : F F = 0 : T M = 1 3 4 7 4 0 0 4 7 9 : L M = 1 7 8 7 4 1 4 3 9 5 : S = f J q m r k s _ 8 G h 3 i 1 K D ; N I D = 6 3 = k 9 f K d q _ 2 g u F 0 O 1 F 5 g V 6 K 3 C I t F b x d d 2 f D W L g x T H f a Q 3 5 P q 4 S D d d d A i H F 9 G G 9 6 2 3 A J v - W U b U p A h 8 _ 0 Y Z T 6 B H Q N 4 f A B 2 j O 3 _ Y 5 q H 7 w x e d A N v I n J 5 R r h l s i i h M Y q e - s u 1 U 1 O ? ? ? ? u s e r - a g e n t ? ? ? g M o z i l l a / 5 . 0 ( X 1 1 ; L i n u x x 8 6 _ 6 4 ) A p p l e W e b K i t / 5 3 7 . 1 0 ( K H T M L , l i k e G e c k o ) C h r o m e / 2 3 . 0 . 1 2 6 4 . 0 S a f a r i / 5 3 7 . 1 0 ? ? ? ? x - c h r o m e - v a r i a t i o n s ? ? ? 0 C M + 1 y Q E I l b b J A Q i d t s k B C K S 2 y Q E I p 7 b J A Q i 9 t s k B C L u D y g E = ? ? ? ? ? ? ? ? : h o s t ? ? ? ? w w w . g o o g l e . c o m ? ? ? ? : m e t h o d ? ? ? ? G E T ? ? ? ? : p a t h ? ? ? ? / c s i ? v = 3 ? s = w e b h p ? a c t i o n = ? s r t = 2 4 6 ? p = s ? n p n = 1 ? e = 1 7 2 5 9 , 2 3 6 2 8 , 2 3 6 7 0 , 3 2 6 9 0 , 3 5 7 0 4 , 3 7 1 0 2 , 3 8 0 3 4 , 3 8 4 4 9 , 3 8 4 6 6 , 3 9 1 5 4 , 3 9 3 3 2 , 3 9 5 2 3 , 3 9 9 7 8 , 4 0 1 9 5 , 4 0 3 3 3 , 3 3 0 0 0 4 7 , 3 3 0 0 1 1 7 , 3 3 0 0 1 2 5 , 3 3 0 0 1 3 2 , 3 3 0 0 1 3 5 , 3 3 0 0 1 5 7 , 3 3 1 0 0 1 1 , 4 0 0 0 1 1 6 , 4 0 0 0 2 6 0 , 4 0 0 0 2 6 7 , 4 0 0 0 2 7 8 , 4 0 0 0 3 0 8 , 4 0 0 0 3 5 2 , 4 0 0 0 3 5 4 , 4 0 0 0 4 7 2 , 4 0 0 0 4 7 6 , 4 0 0 0 5 1 6 , 4 0 0 0 5 1 9 , 4 0 0 0 5 5 3 , 4 0 0 0 5 9 3 , 4 0 0 0 6 0 5 , 4 0 0 0 6 1 6 , 4 0 0 0 7 6 2 , 4 0 0 0 8 2 5 , 4 0 0 0 8 3 7 , 4 0 0 0 8 4 1 , 4 0 0 0 8 4 9 ? e i = Y r N P U N P t H 8 T C g A f Y 7 I C A C g ? i m c = 2 ? i m n = 2 ? i m p = 2 ? r t =

That's a pretty busy diagram! But I don't think it's too bad with a bit of explanation:

zlib uses a language with basically two statements: “output these literal bytes” and “go back x bytes and duplicate y bytes from there”. In the diagram, red text was included literally and black text came from duplicating previous text.

The duplicated text is underlined. A dark blue underline means that the original text is in the diagram and there will be a gray line pointing to where it came from. (You can hover the mouse over one of those lines to make it darker.)

A light blue underline means that the original text came from a pre-shared dictionary of strings. SPDY defines some common text, for zlib to be able to refer to, that contains strings that we expect to find in the headers. This is most useful at the beginning of compression when there wouldn't otherwise be any text to refer back to.

The problem that CRIME highlights is that sensitive cookie data and an attacker controlled path is compressed together in the same context. Cookie data makes up most of the red, uncompressed bytes in the diagram. If the path contains some cookie data, then the compressed headers will be shorter because zlib will be able to refer back to the path, rather than have to output all the literal bytes of the cookie. If you arrange things so that you can probe the contents of the cookie incrementally, then (assuming that the cookie is base64), you can extract the cookie byte-by-byte by inducing the browser to make requests.

For details of how to get zlib to reveal that information in practice, I'll just refer you to Duong and Rizzo's CRIME presentation. It's good work.

In order to carry out this attack, the attacker needs to be able to observe your network traffic and to be able to cause many arbitrary requests to be sent. An active network attacker can do both by injecting Javascript into any HTTP page load that you make in the same session.

When we learned of this work, we were already in the process of designing the compression for SPDY/4, which avoids this problem. But we still needed to do something about SPDY/2 and SPDY/3 which are currently deployed. To that end, Chrome 21 and Firefox 15 have switched off SPDY header compression because that's a minimal change that easily backports.

Chrome has also switched off TLS compression, through which a very similar attack can be mounted.

But we like SPDY header compression because it saves significant amounts of data on the wire! Since SPDY/4 isn't ready to go yet we have a more complex solution for Chrome 22/23 that compresses data separately while still being backwards compatible.

Most importantly cookie data will only ever be duplicated exactly, and in its entirety, against other cookie data. Each cookie will also be placed in its own Huffman group (Huffman coding is a zlib detail that I skipped over in the explanation above). Finally, in case other headers contain sensitive data (i.e. when set by an XMLHttpRequest), non-standard headers will be compressed in their own Huffman group without any back references.

That's only a brief overview of the rules. The code to follow them and continue to produce a valid zlib stream wasn't one of the cleaner patches ever landed in Chrome and I'll be happy to revert it when SPDY/4 is ready. But it's effective at getting much of the benefit of compression back.

To the right are a couple of images of the same sort of diagram as above, but zoomed out. At this level of zoom, all you can really see are the blocks of red (literal) and blue (duplicated) bytes. The diagram on the right has the new rules enabled and, as you can see, there is certainly more red in there. However that's mostly the result of limited window size. In order to save on server memory, Chrome only uses 2048-byte compression windows and, under the new rules, a previous cookie value has to fit completely within the window in order to be matched. So things are a little less efficient until SPDY/4, although we might choose to trade a little more memory to make up for that.

September 21, 2012 07:00 AM

September 18, 2012

Stan Steel

Career Resources

Here I am compiling a list of resources used to prepare for technical interviews:

developer auction
CareerCup
TopCoder

by Stan Steel (steel@kryas.com) at September 18, 2012 06:49 PM

go with confidence

applied mux()ing: a LimitBuffer

In a previous post I discussed mux()ing, which is probably one of my favorite things.

Here I will show a fun application of mux()ing that I’m calling a LimitBuffer. It’s similar to a bytes.Buffer, except that it limits the amount of data stored in its buffer at any given time. Calls to .Write() will block until they no longer overflow the buffer. Calls to .Read() will block until there is data to read or the LimitBuffer has been closed.

First, the basic type and constructor.

type LimitBuffer struct {
    limit    int
    buf      bytes.Buffer
    writes   chan writeRequest
    reads    chan readRequest
    isclosed bool
}

func NewLimitBuffer(limit int) (lb *LimitBuffer) {
    lb = &LimitBuffer{
        limit:  limit,
        writes: make(chan writeRequest),
        reads:  make(chan readRequest),
    }
    go lb.mux()
    return
}

Since we’re using a mux(), we need to create the channels that bring data safely into the mux() goroutine. The .writes channel will take care of calls to .Write() and .Close(), and the .reads channel will take care of calls to .Read(). Of these three important methods, .Write() and .Read() both have return values. Since channels are mostly one-way means of communication, we’ll have to do something extra here.

This brings us to the request and response types.

type writeRequest struct {
    buf      []byte
    closeit  bool
    response chan writeResponse
}
type writeResponse struct {
    n   int
    err error
}

type readRequest struct {
    buf      []byte
    response chan readResponse
}
type readResponse struct {
    n   int
    err error
}

The extra bit is the response channel in both request types. Since it takes a channel to get data in, it makes sense that we’d use a channel to get data out as well.

Both reading and writing have their own response types that wrap the return values of a normal .Read() and .Write() method.

Let’s look at the simpler operation first: .Read().

func (lb *LimitBuffer) Read(buf []byte) (n int, err error) {
    req := readRequest{
        buf:      buf,
        response: make(chan readResponse, 1),
    }
    lb.reads <- req
    response := <-req.response
    n, err = response.n, response.err
    return
}

We created a request, loaded it up with the buffer and made a channel that the response could be sent back on. The channel is buffered so that the mux() wastes no time on unneeded synchronization when sending the data back. There’s no reason for it to wait around until the .Read()ing goroutine wakes back up and gets the response. Since the channel is buffered, it can drop the value in and go on its way with other operations.

I’ve made .Write() a little more complicated so that it will block until it is able to completely write the buffer into the LimitBuffer. Since the LimitBuffer will store only a certain amount of data at a time, this could cause .Write() to wait until more .Read() calls have been executed.

It would be acceptable for me to have let .Write() do partial writes, and return how much data was written, but then to be useful almost anyone would have to write a for loop similar to the one below.

If there were a .Write() analog to io.ReadFull(), I could use that here. But there isn’t, so I don’t. It would look a lot like this anyway.

func (lb *LimitBuffer) Write(buf []byte) (n int, err error) {
    for len(buf) > 0 {
        req := writeRequest{
            buf:      buf,
            response: make(chan writeResponse, 1),
        }
        lb.writes <- req
        response := <-req.response
        m, werr := response.n, response.err
        n += m
        if werr != nil {
            err = werr
            return
        }
        buf = buf[m:]
    }
    return
}

Since in go-land closing is usually considered a write operation (at least, it is with channels), I have piggybacked on the write request to allow closing, too. Since I have decided that there is no possibility of error when closing, the writeRequest has no response channel.

func (lb *LimitBuffer) Close() error {
    req := writeRequest{
        closeit:  true,
    }
    lb.writes <- req
    return nil
}

Now we get to the meat of the code - the mux() method. Usually I like to have a single for{} with a single select{} inside, but in this case there are some special situations.

func (lb *LimitBuffer) mux() {
    for {

If the buffer is closed, all .Write()s return an error, and all .Read()s return an error once the buffer has emptied.

        if lb.isclosed {
            if lb.buf.Len() == 0 {
                select {
                case req := <-lb.reads:
                    req.response <- readResponse{
                        n:   0,
                        err: io.EOF,
                    }
                case req := <-lb.writes:
                    lb.handleWriteClosed(req)
                }
            } else {
                select {
                case req := <-lb.reads:
                    lb.handleRead(req)
                case req := <-lb.writes:
                    lb.handleWriteClosed(req)
                }
            }
            continue
        }

If the buffer is at its limit, we’ll save the .Write()s for later.

        if lb.buf.Len() > lb.limit {
            lb.handleRead(<-lb.reads)
            continue
        }

If the buffer is currently empty, we can’t deal with a .Read() (or at least, I don’t want to).

        if lb.buf.Len() == 0 {
            lb.handleWrite(<-lb.writes)
            continue
        }

If it’s not closed, empty, or at its limit, then both .Read()s and .Write()s can happen.

        select {
        case req := <-lb.reads:
            lb.handleRead(req)
        case req := <-lb.writes:
            lb.handleWrite(req)
        }
    }
}

Since some code would have been duplicated otherwise, I dropped it into helper methods.

func (lb *LimitBuffer) handleRead(req readRequest) {
    n, err := lb.buf.Read(req.buf)
    req.response <- readResponse{n, err}
}

func (lb *LimitBuffer) handleWrite(req writeRequest) {
    if req.closeit {
        lb.isclosed = true
    } else {
        m := lb.limit - lb.buf.Len()
        if m > len(req.buf) {
            m = len(req.buf)
        }
        n, err := lb.buf.Write(req.buf[:m])
        req.response <- writeResponse{n, err}
    }
}

func (lb *LimitBuffer) handleWriteClosed(req writeRequest) {
    req.response <- writeResponse{
        n:   0,
        err: errors.New("Writing to closed stream"),
    }
}

Here is a gist with the full code embedded in an example program.

September 18, 2012 03:28 PM

Stan Steel

Casting a Go uint32 to a byte array

Why I am keeping this information is a mystery.

func main() {
  var i uint32 = 0x12345678;
  x := (*[4]byte)(unsafe.Pointer(&i))
  for _, xn := range *x {
    fmt.Printf("%x\n", xn)
  } 
}

by Stan Steel (steel@kryas.com) at September 18, 2012 02:52 PM

September 17, 2012

go with confidence

no methods on interfaces

original post

At one point, in the #golang IRC channel, I had occasion to explain why you cannot define a method on an interface type.

Here is the code whose behavior you can mull over.

package main

type Concrete int

func (c Concrete) Foo() {
    println("concrete foo")
}

func (c Concrete) Bar() {
    println("concrete bar")
}

type Interface1 interface {
    Foo()
}

func (i Interface1) Bar() {
    println("interface bar")
}

type Interface2 interface {
    Foo()
    Bar()
}

func main() {
    var c Concrete
    var i1 Interface1 = c
    var i2 Interface2 = i1
    i2.Foo() // prints "concrete foo"
    i2.Bar() // prints... what?
}

The core of the matter is that only concrete types are recorded when you put something into any kind of interface. If interface types were recorded as well, then there are a few unfortunate consequences.

First, either you only remember the most recent type the thing had (that is, the last interface it came from) and you’d forget something’s original type, or you have an arbitrarily deep stack of types passed along with the value.

While forgetting the previous types could produce working code, interfaces would no longer be particularly useful.

Keeping the entire stack of types that something has been labeled by also adds a lot of complexity, and it would be very hard indeed to keep track of what methods were available. Can you get something that has the method .Foo() out of this interface{}? Well, maybe, but the number of times you’d have to type assert it would depend on the control path up until that point.

Second, even if you did remember all the interfaces that have labeled your value until this point, there is a good deal of ambiguity. Whose .Foo() method are we to use? I suppose the last label that had such a method, but this is complicated and would be extremely error prone.

This is all made moot by the fact that anything you would be able to do by defining methods on interfaces can already be done using existing go syntax. Taking something that is an interface and giving it extra behavior is straightforward.

type MyInterface interface {
    Foo()
}

type MyWrapper struct {
    MyInterface
}

func (mw MyWrapper) Bar() { ... }

Here we are in essence recording the type stack ourselves without much effort. If we’ve got a “var x MyInterface”, then “y := MyWrapper{x}” is sufficient to create a new value that will invoke the original concrete type’s .Foo() method and the MyWrapper’s .Bar() method.

The moral of the story is, allowing methods to be defined on interfaces gives no extra power and adds a lot of extra confusion.

September 17, 2012 04:50 PM

research!rsc

A Tour of Acme

People I work with recognize my computer easily: it's the one with nothing but yellow windows and blue bars on the screen. That's the text editor acme, written by Rob Pike for Plan 9 in the early 1990s. Acme focuses entirely on the idea of text as user interface. It's difficult to explain acme without seeing it, though, so I've put together a screencast explaining the basics of acme and showing a brief programming session. Remember as you watch the video that the 854x480 screen is quite cramped. Usually you'd run acme on a larger screen: even my MacBook Air has almost four times as much screen real estate.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="480" src="http://www.youtube.com/embed/dP1xVpMPn8M?rel=0" width="853"></iframe>

The video doesn't show everything acme can do, nor does it show all the ways you can use it. Even small idioms like where you type text to be loaded or executed vary from user to user. To learn more about acme, read Rob Pike's paper “Acme: A User Interface for Programmers” and then try it.

Acme runs on most operating systems. If you use Plan 9 from Bell Labs, you already have it. If you use FreeBSD, Linux, OS X, or most other Unix clones, you can get it as part of Plan 9 from User Space. If you use Windows, I suggest trying acme as packaged in acme stand alone complex, which is based on the Inferno programming environment.

Mini-FAQ:

  • Q. Can I use scalable fonts? A. On the Mac, yes. If you run acme -f /mnt/font/Monaco/16a/font you get 16-point anti-aliased Monaco as your font, served via fontsrv. If you'd like to add X11 support to fontsrv, I'd be happy to apply the patch.
  • Q. Do I need X11 to build on the Mac? A. No. The build will complain that it cannot build ‘snarfer’ but it should complete otherwise. You probably don't need snarfer.

If you're interested in history, the predecessor to acme was called help. Rob Pike's paper “A Minimalist Global User Interface” describes it. See also “The Text Editor sam

Correction: the smiley program in the video was written by Ken Thompson. I got it from Dennis Ritchie, the more meticulous archivist of the pair.

September 17, 2012 03:00 PM

September 14, 2012

Sonia Hamilton

Golang: checking open files and memory usage

Notes to myself more than anything, and not really specific to Go (but that’s where I was using it).

To watch the memory usage of a process with pid PID:

while [ 1 ] ; do
  grep VmSize  /proc/PID/status ; sleep 10
done

To watch the number of file descriptors being used:

while [ 1 ] ; do
  sudo lsof -p PID | wc -l ; sleep 10
done

by Sonia Hamilton at September 14, 2012 06:18 AM

September 13, 2012

go with confidence

just muxing about

In much of the programming universe, the preferred method of synchronization between two or more concurrent processes is the mutex. And for good reason: mutexes (mutices?) provide a very simple tool that is easy to understand, and once you acquire that understanding you can use it to build arbitrarily complex concurrent systems.

Eventually.

The problem is that what you want as the programmer is rarely limited to exactly what a mutex gives you. Most of the time you need something a bit more complex, and although the more complex operation can be built out of mutexes, small errors become huge bugs which can be difficult to fix or even just observe (heisenbugs).

In this post I will demonstrate how to build a useful real-world concurrent system, first using mutexes and then using the more advanced tools that go offers.

the problem

The system in question is a notification multiplexor. One process, which I’ll call the Listener, will receive notifications from an outside source. Some number of other processes, which I will call the Subscribers, need to collect these notifications.

a mutex solution

// for simplicity, our notifications will just be simple strings
type Notification string

// and different subscribers tell us about which ones they want via a filter
type NotificationFilter func(string)bool

// a PollFunc gets all outstanding notifications for its subscriber
type PollFunc func() []Notification

type Notifier struct {
    // we'll embed a mutex in the Notifier, so it can be locked and unlocked
    sync.Mutex
    // queues is the list of outstanding notifications for each subscriber
    queues [][]Notification
    filters []NotificationFilter
}

func NewNotifier() *Notifier {
    return new(Notifier)
}

func (n *Notifier) Subscribe(filter NotificationFilter) (PollFunc) (
    n.Lock()
    defer n.Unlock()
    index := len(n.queues)
    n.queues = append(n.queues, []Notification{})
    n.filters = append(n.filters, filter)

    return func() (results []Notification) {
        n.Lock()
        defer n.Unlock()
        results = n.queues[index]
        n.queues[index] = n.queues[index][:0]
    }
}

func (n *Notifier) Notify(not Notification) {
    n.Lock()
    defer n.Unlock()
    for i, filter := range n.filters {
        if filter(not) {
            n.queues[i] = append(n.queues[i], not)
        }
    }
}

Seems pretty straight forward, right? It is. This is a simple system, and each of the operations that acquire the mutex are finite time and efficient (assuming that the filter function is too).

There are a few problems, here.

Foremost, and the concurrency sharp-shooters out there will have already noticed this, is that polling the notifier is a busy-wait operation. Even if your program only polls once every 20 seconds, that is still wasted cycles every 20 seconds if no notifications have come in.

And what if before those 20 seconds have elapsed, a whole stack of notifications have queued up? Perhaps a condition variable that unlocks when some condition has been met (like, the number of notifications is not zero). But waiting on a condition variable blocks your entire goroutine.

Go’s channels and the select{} statement provide an elegant response to these questions.

a channel/select{} solution

The method to accomplish this goal that I will show here is something often called a mux(), which is short for multiplexor.

To give a little go background, one of the ways that the current go runtime is so effective is that it will multiplex goroutines (Gs) to some set of available processes (Ms, short for machines). A given running go program will have some number of Ms, and some set of Gs that are not currently blocking. That is, the Gs aren’t performing a channel send/receive, aren’t trying to do some kind of I/O (networked or local), aren’t blocked on a semaphore acquire, or for some reason aren’t unable to make progress until something else happens.

When one of the running Gs hits a blocking operation, it is removed from its M and replaced with another G that is ready to go. This kind of CPU time-sharing is usually expressed in terms of coroutines, and that provides the basis for the word goroutine.

We’re going to do something similar with the notification system, except instead of Ms we can have the notifier’s goroutine, and instead of Gs we have the various things that can happen, which for our example are “new notification” and “tell me about a notification”.

Also, in a previous post I discussed stacked channels. Moving forward, I’m assuming the reader either knows how they work or does not care.

// the stacked channel joins notification sets by appending the slice
type NotificationChan chan []Notification
func (nch NotificationChan) Stack(ns []Notification) { ... }
func JoinNotifications(ns1, ns2 []Notification) (ns3 []Notification) {
    ns3 = append(ns1, ns2...)
    return
}

type NotificationFilter func(Notification) bool

type newSubscriber struct {
    filter NotificationFilter
    res NotificationChan
}

type Notifier struct {
    // the listener sends new notifications on this channel
    Incoming NotificationChan
    filters []NotificationFilter
    chans []NotificationChan
    newSubscribers chan newSubscriber
}

func NewNotifier() (n *Notifier) {
    n = &Notifier{
        Incoming: make(NotificationChan, 1)
    }
    go n.mux()
}

func (n *Notifier) Subscribe(filter Filter) NotificationChan {
    ns := newSubscriber{
        filter: filter,
        res: make(chan NotificationChan),
    }
    n.newSubscribers <- ns
    return <-ns.res
}

func (n *Notifier) mux() {
    for {
        select {
        case ns := <-n.newSubscribers:
            n.filters = append(n.filters, ns.filter)
            ch := make(NotificationChan, 1)
            n.chans = append(n.chans, ch)
            ns.res <- ch
        case nots := <-n.Incoming:
            for _, not := range nots {
                for i, filter := range n.filters {
                    if filter(not) {
                        n.chans[i].Stack(not)
                    }
                }
            }
        }
    }
}

With a mux()er like this, instead of a system of polling or condition variables, we get a channel. The mux() goroutine takes care of moving notifications coming from the listener (via n.Incoming) to the subscribers.

Also, since adding subscribers touches a data structure that sending notifications also touches, new subscribers come in on a channel so they can be handled next to the incoming notifications in a single goroutine.

And the greatest benefit is that the subscribers can select on their own notification channels as well channels for communicating with other processes. No special condition magic needs to be performed - select{} gives you everything you need.

September 13, 2012 12:31 AM

September 12, 2012

go with confidence

stacked channels

Sometimes a go programmer will wish for an infinitely buffered channel. Go does not offer any such construct, though by creating two channels and a goroutine to move data between them, it is possible to have infinitely buffered channel semantics.

Sometimes what people actually want is a channel that never blocks and never forgets. This isn’t quite the same as a channel with an infinite buffer. I didn’t mention anything about preserving the original message order, for one.

What you can do in the situation I describe is use something I’ve been calling a “stacked” channel. It’s a buffered channel with a special send operation and whose value type has a meaningful “join” function.

type Thing anything

func JoinThings(thing1, thing2 Thing) (thing3 Thing) { ... }

type ThingChan chan Thing

func NewThingChan() ThingChan {
    return make(ThingChan, 1)
}

func (tch ThingChan) Stack(thing1 Thing) {
    for {
        select {
        case tch <- thing1:
            return
        case thing2 := <- tch:
            thing1 = JoinThings(thing1, thing2)
        }
    }
}

If it’s not immediately clear how stacking works, take a minute to try to figure it out before reading the explanation below.

The stacked channel allows non-blocking sending. That is, whenever a goroutine wants to send to this channel using the .Stack() method, it will complete quickly (provided that the join function completes quickly).

This non-blocking occurs because when the select{} statement is executed, there are two possible states for the channel. Either its buffer is full or its buffer is empty. If the buffer is empty, the new value will be put in immediately. If the buffer is full, the value will be picked off, combing with the new value and put back on.

It is possible that another goroutine stacks something in the meantime, and it will have to pick off another value to join and put back. With a fair scheduler, every goroutine attempting to stack will make progress quickly, relative to the the number of goroutines in contention. This progress comes about because every time a goroutine has to loop back and try again, there must have been another goroutine that succeeded in leaving a value on the channel.

This stacking technique has definite real-project application. I work for a private corporation on a public project called skynet. With skynet, client programs need to be notified about new services as they come available. Sometimes these notifications can come faster than a client can ask about them, and they stack up. We use a stacked channel to collect the notifications into bundles without stalling out the notification system.

Here’s a play example: http://play.golang.org/p/MEp87YesU6

September 12, 2012 11:24 PM

September 10, 2012

jra's thoughts

Once upon a time…

My son now needs a bedtime story, whispered in the dark, to go to sleep. My wife and I both love good story telling. She even took a course on it once, and told a story to an audience as the final project. We go to le Nuit des Contes every year here in Lausanne, and I’ve picked up some tips from watching the (literally) professionals there. A key to oral story telling, certainly for small children, is to use a structure with repeating sounds and phrases that they can get wrapped up in.

In order to have these some day to look back at, I’m going to start writing down summaries of stories I tell. If one is good enough to develop and retell, perhaps one day I’ll tell it at the Nuit des contes!

Our 1974 VW Type 2 camper van (“Bep” is his name) would be the star of every story if Elio got to choose. Instead, we offer him a choice of three characters and then go from there. The key, I find, is to start slowly, describing the character, throwing in some fun details right away. This gives you time to race ahead in your mind and choose a rough storyline. The easiest is to choose the end state first, so you know where you are trying to get to. Then, like a dot to dot painting, you need to fill in a few intermediate hops along the way. These are attempts the character makes at achieving the goal, or increasingly dire straights the character finds himself in. These stretch out the story, but more importantly, give it a verse/chorus/verse/chorus/verse structure, which is where the real magic comes from. And the rhythm necessary to put a fidgety 2-year old kid to sleep, as well, which is the point afterall!

What do I mean by verse/chorus? The chorus is the catch-phrase, the repeated element that signals another loop around. It gives the story rhythm and momentum. The verses carry the story forward, so that you get to where you are trying to go and you get that nice satisfying conclusion.

So here are several stories that I’ve told so far:

  • Bep and the Doubting Family: A story from real life, about a little camper that joined a new family and though it had been crossing the passes of Switzerland all its life, the mom and dad were worried it couldn’t do it (a family version of the Little Engine that Could). For each pass: Daddy said, “I don’t know if he can make it”, and Mommy said, “I’m afraid he can’t make it!”, and Elio said, “Go Bep, Go!”, and Bep said, “beep beep, vroom vroom, I know I can make it… I know I can make it… I knew I could make it!”.
  • Bep and the Big Campers: Big campers at a campground are making fun of Bep (“He’s only got 4 cylinders!”, “No fuel injection!”, “He doesn’t even run on diesel!”, “He’s got no toilet!”, “His exhaust stinks: che sputzza di benzina!”). Bep says he’s as good as other campers, but they demand proof. He names all the passes he’s done, and the bigger campers say, “Wait, they let you do that pass? I’m not allowed on it because I’m too big!”. The Dutch camper says, “My owners drive me all night long on the autoroute here, and go through all the tunnels to save time!”. The big-ass bus starts crying because he’ll never get to see Passo del Lucomagno because he’s too long. Bep makes him feel better by reminding him that he can drive all night from Barcelona and his owners don’t even have to go pee at the gas station. All the campers are friends after that.
  • Bunny and the Giant Carrot. A little bunny tells her mom that she loves carrots. She loves them so much she wants to grow up and be a farmer and grow carrots. She’ll sell them all around the forest. She’ll sell them by the big rock, and past the oak tree, etc, etc, etc. (This was inspired by Elio’s cousin, who told us this summer of a plan to be a farmer and sell his produce all around Switzerland in order to get out and see the country. Good plan, if you ask me!) She tells her mother the same thing every night, each night adding one more place she’ll sell her carrots, and naming all the others. One night, her mother reminds her that tomorrow is a special day, her birthday. She receives a single carrot seed, but it’s magic. The next morning, she has a regular carrot in her plot. But her mom convinces her to wait another night (and another and another, as many as it takes to make your fidgity 2 year old tired). Then it goes to seed and gives her all the seeds she’ll need to achieve her dream of being a carrot farmer.
  • Bep and the Apple. On a long trip, the daddy stops the car at a fruit stand. (Based on real life, and my fond memories of driving from San Francisco to Arnold as a child). He buys an apple for everyone (mommy, daddy, Elio and Emma). Each person crunches their apple, except Emma who coos because she knows she’ll get her apple cooked for snacktime later. Bep complains that he didn’t get an apple, and daddy explains to Bep that he can’t eat them, Bep eats gas. But daddy promises to make it up to Bep by getting him an apple anyway. Bep goes su, su, su! the pass, turning left and right and left and right and a moto goes by and goes zoom! (Repeat for as many switchbacks and/or passes as necessary until kidlet is cuddling and knodding off.) Bep is running out of gas and worried that daddy won’t fill him in time. But just when he’s sure he can’t go another kilometer, around the corner comes a giant sign with an apple on top. It’s Apple Gas, where Bep gets his gas, just as daddy promised!
  • The Lonely King. The king of a country with one citizen (the king himself) is lonely. On the plus side, all his subjects follow his orders, but having more people to play with would be nice. So he tries everything he can think of to get more citizens. He plants flowers on his castle, but the tourists just take pictures. He makes a decree that all the women of the kingdom (i.e. zero) must marry him, but that doesn’t work because there are none and because women don’t like to be told what to do like that anyway. He tries giving away cookies, but he runs out and no one moves in anyway. One day, a nice lady goes out picking mushrooms. She goes up up up the mountain until she finds herself in the kingdom. She asks the king if he’d like to look for mushrooms with her, and they laugh and have fun until sunset. She comes back for more mushrooms, day after day. After two weeks, the are in love. After two months, they are married. After two years, they have a family. And when the king and his queen had a happy family in the kingdom, other people came to join the happy kingdom and the king was no longer lonely (though he was likely exhausted from staying up late into the night telling stories to his kids).

by jra at September 10, 2012 11:04 PM

September 05, 2012

Shadynasty Business

Auth and Sessions

I think it’s time to bring the guestbook to the next level, and that means users and sessions. This post will show you how to handle user registration and authentication. Let’s get started!

The User Type

The first thing we’re going to do is create a type to store the information of the user. So what do users have? Well, an ID to identify them in the database, a username, and a password. To make this a little more fun, we’re also going to store the number of times they’ve posted on the guestbook. So here’s our type:

<figure class="code"><figcaption>user.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
package main

import (
  "code.google.com/p/go.crypto/bcrypt"
  "labix.org/v2/mgo/bson"
)

type User struct {
  ID       bson.ObjectId `bson:"_id,omitempty"`
  Username string
  Password []byte
  Posts    int
}
</figure>

Now lets define some functions to help hash the password and set it on the user and and authenticate a user given a username and password.

<figure class="code"><figcaption>user.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
//SetPassword takes a plaintext password and hashes it with bcrypt and sets the
//password field to the hash.
func (u *User) SetPassword(password string) {
  hpass, err := bcrypt.GenerateFromPassword([]byte(password), bcrypt.DefaultCost)
  if err != nil {
      panic(err) //this is a panic because bcrypt errors on invalid costs
  }
  u.Password = hpass
}

//Login validates and returns a user object if they exist in the database.
func Login(ctx *Context, username, password string) (u *User, err error) {
  err = ctx.C("users").Find(bson.M{"username": username}).One(&u)
  if err != nil {
      return
  }

  err = bcrypt.CompareHashAndPassword(u.Password, []byte(password))
  if err != nil {
      u = nil
  }
  return
}
</figure>

Now lets work on the handler to log them in.

First Sign of Trouble

The login handler should be pretty simple. All we have to do is get the username and password from the form POSTed to the handler, and pass it to our Login function which will grab the user from the database and authenticate the credentials.

<figure class="code"><figcaption>handlers.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func login(w http.ResponseWriter, req *http.Request, ctx *Context) (err error) {
  //grab the username and password from the form
  username, password := req.FormValue("username"), req.FormValue("password")

  //log in the user
  user, err := Login(ctx, username, password)

  //what to do now? if there was an error we want to present the form again
  //with some error message.

  //where do we store the user if the login was valid?

  //answer: sessions!
  _ = user
  return
}
</figure>

But we ran into some trouble. When should we display the template for the form? Where do we store that the authentication was correct? Fortunately it’s not too hard to fix these problems. Lets handle the displaying of the template part first.

Two Handlers Are Better Than One

The login handler is really two actions. When a GET request is passed to the handler it should display a nice form, but when a POST request is passed to the handler it should authenticate a user. These different actions based on the verb used on the URL means we should dispatch to the correct handler in the router rather than the handler itself. Lets write the simple form displaying template first.

<figure class="code"><figcaption>handlers.go </figcaption>
1
2
3
4
5
6
7
8
9
var login = parseTemplate(
  "templates/_base.html",
  "templates/login.html",
)

func loginForm(w http.ResponseWriter, req *http.Request, ctx *Context) (err error) {
  err = login.Execute(w, nil)
  return
}
</figure>

The code was getting “smelly” because it wasn’t nice to have global variables storing the templates, so it wasn’t too hard to whip up a simple function to compile templates and cache them on the fly. Here’s what that looks like, and the new handler using it:

<figure class="code"><figcaption>template.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package main

import (
  "html/template"
  "path/filepath"
  "sync"
)

var cachedTemplates = map[string]*template.Template{}
var cachedMutex sync.Mutex

var funcs = template.FuncMap{
  "reverse": reverse,
}

func T(name string) *template.Template {
  cachedMutex.Lock()
  defer cachedMutex.Unlock()

  if t, ok := cachedTemplates[name]; ok {
      return t
  }

  t := template.New("_base.html").Funcs(funcs)

  t = template.Must(t.ParseFiles(
      "templates/_base.html",
      filepath.Join("templates", name),
  ))
  cachedTemplates[name] = t

  return t
}
</figure> <figure class="code"><figcaption>handlers.go </figcaption>
1
2
3
func loginForm(w http.ResponseWriter, req *http.Request, ctx *Context) (err error) {
  return T("login.html").Execute(w, nil)
}
</figure> <figure class="code"><figcaption>Login Template </figcaption>
1
2
3
4
5
6
7
8
9
10
{{ define "title" }}Guestbook - Login{{ end }}

{{ define "content" }}
    <h1>Login</h1>
    <form action="{{ reverse "login" }}" method="POST">
        <p>Username: <input type="text" name="username"></p>
        <p>Password: <input type="password" name="password"></p>
        <p><button>Login</button></p>
    </form>
{{ end }}
</figure>

At this point I had a working login page that would 404 when you clicked Login. There were some smaller changes made around to clean things up that you can see in this commit (I have annotated the commit to include some comments on the changes.) Let’s add the login form handling now.

Sessions

A session is just some data attached to some id that you hand the client in a cookie. This way when a client requests a page, you can look at the cookie value and get the id for the data and load up the data for that request. Tada! Sessions! For our implementation of sessions, we’re once again going to use the excellent gorilla package for sessions. It lets you use different stores for the backend data, and in this case we’re just going to use a cookie store. This stores all the data in the cookie the client sends to you. This does mean that the user can tamper with the cookie, but the data is verified using a secret value and a hash, and can optionally be encrypted with another secret value. For this I’m just going to use a store that doesn’t encrypt the data: after all, the data the store uses is open source.

<figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
import (
  "code.google.com/p/gorilla/sessions"
  //...
)

var store sessions.Store
//...


func main() {
  //...
  store = sessions.NewCookieStore([]byte(os.Getenv("KEY")))
}
</figure>

So we defined a cookie store, now let’s add grabbing the session to the context.

<figure class="code"><figcaption>context.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
type Context struct {
  Database *mgo.Database
  Session  *sessions.Session
}

func NewContext(req *http.Request) (*Context, error) {
  sess, err := store.Get(req, "gostbook")
  return &Context{
      Database: session.Clone().DB(database),
      Session:  sess,
  }, err
}
</figure>

The last thing we need to do is make sure the handlers save the session when they’re done with it. Unfortunately this causes a problem. Saving the session requires modifying the headers of the response, and if the handler has already started outputting data, the headers have already been sent and that ship has sailed. Theres two approaches to solving this problem. The first is to just make sure in each handler to save the session before writing anything to the ResponseWriter, which can be a little verbose and error prone but provides the best performance. The second is to use the fact that a ResponseWriter is an interface and use our handler type to substitute in a buffered ResponseWriter that stores all the data and header information written to it, so that it can be output at the end all at once. I wrote a package to help with the second option so it’s clearly the one I prefer. Here’s how we can hook that up:

<figure class="code"><figcaption>http.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package main

import (
  "net/http"
  "thegoods.biz/httpbuf"
)

type handler func(http.ResponseWriter, *http.Request, *Context) error

func (h handler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
  //create the context
  ctx, err := NewContext(req)
  if err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
  }
  defer ctx.Close()

  //run the handler and grab the error, and report it
  buf := new(httpbuf.Buffer)
  err = h(buf, req, ctx)
  if err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
  }

  //save the session
  if err = ctx.Session.Save(req, buf); err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
  }

  //apply the buffered response to the writer
  buf.Apply(w)
}
</figure>

All we do is create an httpbuf.Buffer and use that as our handler, finishing with a call to its Apply method. With that, we can set and grab session values in the handlers by just interacting with ctx.Session, and everthing will be saved when we’re done.

Back to Authentication

Now that we have sessions, we know where we can store the user. Lets write the login handler for the user then.

<figure class="code"><figcaption>handlers.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
func login(w http.ResponseWriter, req *http.Request, ctx *Context) error {
  username, password := req.FormValue("username"), req.FormValue("password")

  user, e := Login(ctx, username, password)
  if e != nil {
      ctx.Session.AddFlash("Invalid Username/Password")
      return loginForm(w, req, ctx)
  }

  //store the user id in the values and redirect to index
  ctx.Session.Values["user"] = user.ID
  http.Redirect(w, req, reverse("index"), http.StatusSeeOther)
  return nil
}
</figure> <figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import (
  "encoding/gob"
  "labix.org/v2/mgo/bson"
  //...
)

func init() {
  gob.Register(bson.ObjectId(""))
}

func main() {
  //...
  router.Add("POST", "/login", handler(login))
  //...
}
</figure>

Note that we have to register the bson.ObjectId type with the gob package because the cookie store uses gob to store the data for the session.

Well, now we log in people and store it in the session, but it’d be nice if that was reflected somehow in the user interface and if the context included information about the logged in user. Lets do some work on the context and handlers to fix this. First, we’re going to add a *User to the context that gets filled in based on the id we stored in the session.

<figure class="code"><figcaption>context.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
type Context struct {
  Database *mgo.Database
  Session  *sessions.Session
  User     *User
}

func NewContext(req *http.Request) (*Context, error) {
  sess, err := store.Get(req, "gostbook")
  ctx := &Context{
      Database: session.Clone().DB(database),
      Session:  sess,
  }
  if err != nil {
      return ctx, err
  }

  //try to fill in the user from the session
  if uid, ok := sess.Values["user"].(bson.ObjectId); ok {
      err = ctx.C("users").Find(bson.M{"_id": uid}).One(&ctx.User)
  }

  return ctx, err
}
</figure>

Now we just have to add the context to the value we pass in to templates to be executed and hook up the templates. This commit shows the details of that, including adding a logout handler, and fixing some minor issues with the code.

Post Count and Creating Users

The last two features we need are letting people register and increasing a persons post count when they post. Let’s work on registration first. Registration works just like logging in, so we need to create a template and two handlers.

<figure class="code"><figcaption>handlers.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func registerForm(w http.ResponseWriter, req *http.Request, ctx *Context) (err error) {
  return T("register.html").Execute(w, map[string]interface{}{
      "ctx": ctx,
  })
}

func register(w http.ResponseWriter, req *http.Request, ctx *Context) error {
  username, password := req.FormValue("username"), req.FormValue("password")

  u := &User{
      Username: username,
      ID:       bson.NewObjectId(),
  }
  u.SetPassword(password)

  if err := ctx.C("users").Insert(u); err != nil {
      ctx.Session.AddFlash("Problem registering user.")
      return registerForm(w, req, ctx)
  }

  //store the user id in the values and redirect to index
  ctx.Session.Values["user"] = u.ID
  http.Redirect(w, req, reverse("index"), http.StatusSeeOther)
  return nil
}
</figure>

The register.html template is very similar to the login template. If you really wan’t to see it, you can find it at this commit. Incrementing the post count is super simple. In the sign handler, we just add

ctx.C("users").Update(bson.M{"_id": ctx.User.ID}, bson.M{
    "$inc": bson.M{"posts": 1},
})

Phew!

So after all that we have a login/user registration system, and a session tied to a context for storing whatever data we want. Hopefully with this guide you can extend it to meet the needs of whatever webapp you’re writing. Thanks for reading so far, and be sure to register and leave a comment on the gostbook or right below if you liked it.

September 05, 2012 08:10 PM

August 29, 2012

Hacking with the Go Programming Language (GoLang)

Golang project update: edigo

After some reflection, what I want to have is in fact to be able to auto-generate a Go package or packages, which can be imported into any Go code later, given an XML EDIFACT grammar file that is generated by some tool, such as (my current employer) Amadeus' Visual Services tool.

The format of this XML EDIFACT grammar file is

<root>
<source>Visual Services 5 - XML Generator
<version>1</version>
<interface description="Yada yada blah." name="SKD" release="9" version="02">
<transactions>...list of transactions...</transactions>
<messages>...list of messages...</messages>
<groups>...list of groups...</groups>
<segments>...list of segments...</segments>
<composites>...list of composites...</composites>
</interface>
</source></root>
The resulting package codes could then be stored in a folder named after the interface name (in the above example "skd") as composites.go, segments.go, groups.go, etc. Every time there's a new version of the grammar XML file, edigo must be used to regenerate the said files.

That's all for now. I will write more on this later. Oh, and by the way, I've created a new repository on GitHub for this.

by Allister (noreply@blogger.com) at August 29, 2012 03:01 PM

August 28, 2012

Hacking with the Go Programming Language (GoLang)

Golang Project Idea: Go EDIFACT

It's been sometime since I've written anything on this blog. A few weeks back I realized that my syntax highlighting brush for golang wasn't available online anymore and so I decided to put it on Github.

Anyway, I've been thinking about what I could do to improve my understanding of the Go programming language.  Earlier today it dawned on me that since I've been working a lot with EDIFACT messages, why not a Go package for handling EDIFACT messages?

So, what do I want to make? I haven't really seen an EDIFACT grammar parser that takes an XML file that defines the grammar and converts a valid EDIFACT message into a Go data structure. Conversely, it should allow the user to fill the Go data structure and generates the corresponding EDIFACT message from it.

Okay, it's time to hack.  I'll post updates on this project later.  Cheers!

by Allister (noreply@blogger.com) at August 28, 2012 01:19 PM

August 23, 2012

Go's official blog

Go updates in App Engine 1.7.1

This week we released version 1.7.1 of the App Engine SDK. It includes some significant updates specific to the App Engine runtime for Go.

The memcache package has had some additions to its Codec convenience type. The SetMulti, AddMulti, CompareAndSwap, and CompareAndSwapMulti methods make it easier to store and update encoded data in the Memcache Service.

The bulkloader tool can now be used with Go apps, allowing users to upload and download datastore records in bulk. This is useful for backups and offline processing, and a great help when migrating Python or Java apps to the Go runtime.

The Images Service is now available to Go users. The new appengine/image package supports serving images directly from Blobstore and resizing or cropping those images on the fly. Note that this is not the full image service as provided by the Python and Java SDKs, as much of the equivalent functionality is available in the standard Go image package and external packages such as graphics-go.

The new runtime.RunInBackground function allows backend requests to spawn a new request independent of the initial request. These can run in the background as long as the backend stays alive.

Finally, we have filled in some missing functionality: the xmpp package now supports sending presence updates and chat invitations and retrieving the presence state of another user, and the user package supports authenticating clients with OAuth.

You can grab the new SDK from the App Engine downloads page and browse the updated documentation.

by Andrew Gerrand (noreply@blogger.com) at August 23, 2012 10:44 PM

Go's official blog

Organizing Go code

Go code is organized differently to that of other languages. This post discusses how to name and package the elements of your Go program to best serve its users.

Choose good names

The names you choose affect how you think about your code, so take care when naming your package and its exported identifiers.

A package's name provides context for its contents. For instance, the bytes package from the standard library exports the Buffer type. On its own, the name Buffer isn't very descriptive, but when combined with its package name its meaning becomes clear: bytes.Buffer. If the package had a less descriptive name, like util, the buffer would likely acquire the longer and clumsier name util.BytesBuffer.

Don't be shy about renaming things as you work. As you spend time with your program you will better understand how its pieces fit together and, therefore, what their names should be. There's no need to lock yourself into early decisions. (The gofmt command has a -r flag that provides a syntax-aware search and replace, making large-scale refactoring easier.)

A good name is the most important part of a software interface: the name is the first thing every client of the code will see. A well-chosen name is therefore the starting point for good documentation. Many of the following practices result organically from good naming.

Choose a good import path (make your package "go get"-able)

An import path is the string with which users import a package. It specifies the directory (relative to $GOROOT/src/pkg or $GOPATH/src) in which the package's source code resides.

Import paths should be globally unique, so use the path of your source repository as its base. For instance, the websocket package from the go.net sub-repository has an import path of "code.google.com/p/go.net/websocket". The Go project owns the path "code.google.com/p/go", so that path cannot be used by another author for a different package. Because the repository URL and import path are one and the same, the go get command can fetch and install the package automatically.

If you don't use a hosted source repository, choose some unique prefix such as a domain, company, or project name. As an example, the import path of all Google's internal Go code starts with the string "google".

The last element of the import path is typically the same as the package name. For instance, the import path "net/http" contains package http. This is not a requirement - you can make them different if you like - but you should follow the convention for predictability's sake: a user might be surprised that import "foo/bar" introduces the identifier quux into the package name space.

Sometimes people set GOPATH to the root of their source repository and put their packages in directories relative to the repository root, such as "src/my/package". On one hand, this keeps the import paths short ("my/package" instead of "github.com/me/project/my/package"), but on the other it breaks go get and forces users to re-set their GOPATH to use the package. Don't do this.

Minimize the exported interface

Your code is likely composed of many small pieces of useful code, and so it is tempting to expose much of that functionality in your package's exported interface. Resist that urge!

The larger the interface you provide, the more you must support. Users will quickly come to depend on every type, function, variable, and constant you export, creating an implicit contract that you must honor in perpetuity or risk breaking your users' programs. In preparing Go 1 we carefully reviewed the standard library's exported interfaces and removed the parts we weren't ready to commit to. You should take similar care when distributing your own libraries.

If in doubt, leave it out!

What to put into a package

It is easy to just throw everything into a "grab bag" package, but this dilutes the meaning of the package name (as it must encompass a lot of functionality) and forces the users of small parts of the package to compile and link a lot of unrelated code.

On the other hand, it is also easy to go overboard in splitting your code into small packages, in which case you will likely becomes bogged down in interface design, rather than just getting the job done.

Look to the Go standard libraries as a guide. Some of its packages are large and some are small. For instance, the http package comprises 17 go source files (excluding tests) and exports 109 identifiers, and the hash package consists of one file that exports just three declarations. There is no hard and fast rule; both approaches are appropriate given their context.

With that said, package main is often larger than other packages. Complex commands contain a lot of code that is of little use outside the context of the executable, and often it's simpler to just keep it all in the one place. Godoc is nearly 6000 lines over 16 files, and the go tool is more than 7000 lines spread across 23 files.

Document your code

Good documentation is an essential quality of usable and maintainable code. Read the Godoc: documenting Go code article to learn how to write good doc comments.

by Andrew Gerrand (noreply@blogger.com) at August 23, 2012 01:42 AM

August 18, 2012

Shadynasty Business

Template Usage and Internals

I’ve seen many tutorials on the internet about how to use templates in Go. They typically concentrate on the syntax of the template and don’t go in to the details of how they’re constructed and used in Go code. That’s why this article is about what a template really is, and how to use them in your code.

A little history

Back before Go 1, the text/template package was different. The real major difference was the package defined two major types: a template and a template set. At some point the library authors decided to merge them into one, the Template type.

So what is a template?

I like to think of a template as a collection of templates, with one promoted as the “default” template. The default template is the one that gets used when the execute method is called. Internally, a template contains a map of all the other templates it is linked with, and each one of those templates contains the same map. This makes the namespace of a template flat. It sounds really confusing, so hopefully we can make it easier with some examples and implications.

Implications

So given that a template is really a set of templates with a mapping of all the template names in the set to the templates (phew), what can we expect in how we work with it?

  • Every template in the set can be called by any other. Because every template shares the same map of name to templates, any template can call any other templates. This is a pretty simple.

  • You can’t have two templates with the same name. This is also pretty obvious but bears stating. Because the namespace is flat, two template with the same name would cause a collision. The package doesn’t allow you to add a template to a set under a name that already exists.

  • Theres no such thing as a subtemplate. Because the namespace is flat, you can’t really have one template be a subtemplate of another. Because every template is accessible from every other template, the concept of one template owning another isn’t really defined. This isn’t to say you can’t use templates like subtemplates, its just that theres no mechanism enforcing this in the package.

  • Lookup is idempotent. This means that calling t.Lookup("name").Lookup("name") is the same as t.Lookup("name"), as long as "name" exists in the template. This is because every template shares the same map of template name to template. When we lookup a specific one, it still has the same map to lookup the next one. This is a handy property because it makes it always safe to call Lookup regardless of the current state of the template.

Gotchas

Here’s some common problems with the template package that people run into:

  • ParseFiles and its friends add the template under the name of the file. This means if you do template.New("base").ParseFiles("foo.html") and try to execute it, you will have an empty template. The template was read and parsed from “foo.html” and added to the template under that name. This means you have to either do a .Lookup("foo.html") or change the name in New to “foo.html”.

  • Not just the name of the file, the basename of the file. This means if you try to do .ParseFiles("a/main.html", "b/main.html") you’ll run into problems because the two files share the same basename. It returns an error saying that you can’t redefine the template named “main.html”.

  • Functions must be added before parsing. During parse time, the template package needs to know all the identifiers that could be used to parse a template correctly. This means that you need to set the function map before you do any parsing. This can make the code a little longer, especially when you start with a simple template.ParseFiles and need to add functions to it.

Quiz time

Now that we know more about how templates work and some common gotchas, lets look at some code and have a quiz. Here’s how the quiz works: I’ll show you a piece of sample code consisting of some files, and you tell me the output. The answers are at the bottom of this post.

Question 1

<figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
package main

import (
  "text/template"
  "os"
)

func main() {
  t := template.Must(template.New("foo").ParseFiles("main.html"))
  if err := t.Execute(os.Stdout, nil); err != nil {
      panic(err)
  }
}
</figure> <figure class="code"><figcaption>main.html </figcaption>
1
Hello World!
</figure>

Question 2

<figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
package main

import (
  "text/template"
  "os"
)

func main() {
  t := template.Must(template.ParseFiles("main.html", "sub.html"))
  if err := t.Execute(os.Stdout, nil); err != nil {
      panic(err)
  }
}
</figure> <figure class="code"><figcaption>main.html </figcaption>
1
{{ template "sub.html" }}
</figure> <figure class="code"><figcaption>sub.html </figcaption>
1
Hello World!
</figure>

Question 3

<figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
package main

import (
  "text/template"
  "os"
)

func greet() string {
  return "Hello World!"
}

func main() {
  t := template.Must(template.New("foo").Parse(`{{ greet }}`))
  t.Funcs(template.FuncMap{
      "greet": greet,
  })
  if err := t.Execute(os.Stdout, nil); err != nil {
      panic(err)
  }
}
</figure>

Answers

  • Question 1: Nothing. The “foo” template is the “active” template and it’s empty. Did you read the Gotchas?

  • Question 2: “Hello World!”. Nothing tricky about this one. Interesting fact: if we reversed the order of the parsed files it would be the same output, even though it does something totally different. (Why?)

  • Question 3: A panic. We don’t have the functions in the template at the time of the parsing, so an error is returned and Must causes a panic. This is Gotcha number 3.

Conclusion

So how did you do? I admit, some of the questions were tricks and subtle so it’s ok if you got some (all) of them wrong, but all of them were inspired by common problems I’ve seen people have. Templates are actually pretty simple, but they have some non-obvious properties that require some care in how they are created and used in your code.

August 18, 2012 03:51 PM

August 07, 2012

codegrunt.co.uk

Quick Update

Metablah

Ok, so I’ve not managed to keep this updated… yeah, I hate meta-blog posts as much as anybody, so I’m going to keep this short - timetabled blogging clearly doesn’t work for me, so I’m officially making no promises about blogging ever again, I promise, even if that is a paradox ;-)

Weak

I’ve been making good progress on weak, though I have been totally focused on optimising move generation so far to the point that the engine doesn’t actually play yet. This is after I switched from go to C since performance in the go version was so poor that I simply couldn’t continue. I have now got to the point where I’m vaguely happy with performance, so I’m going to work on actually making the engine play. I also plan to back-port to go at some point. It’ll be interesting to see how it looks after all the optimisations I’ve applied since going to C.

The switch isn’t necessarily a poor reflection on go, rather a combination of it being a very young language, and the fact that go is pretty much an unsuitable choice for this project. A chess engine is quite an unusual piece of software in any case, as it really does rely on pedantic optimisation more than a lot of other software might, and certain handy assembly instructions (bit scan forward/backward, pop count.)

Since I have used several ideas from stockfish for optimising weak, I have moved to a GPLv3 license and officially acknowledged weak as a stockfish derivative. The degree of derivation right now is small, by no means have I copied code wholesale (the hours and hours spent debugging many very evil bugs is a testament to that), but there are certain portions of code which have been heavily influenced by my having scoured stockfish’s code, and a few portions of code which are essentially direct C ports of portions of stockfish (written in C++), hence the necessity of this move.

Moving to this arrangement also affords me the opportunity to mine stockfish for ideas as much as I like. Given the incredible quality of stockfish, that is quite useful to say the least. Regardless, I am not interested in simply porting the code to C, so it’ll be a matter of mining the code for ideas and applying them to weak.

Epiblog

Hopefully I’ll put up a decent post about chess engine development at some point. In other news, I am enjoying my time in startup-land, having just spent a week in Greece working with some of my Grecian colleagues in a slightly nicer climate than dear old Blighty.

August 07, 2012 11:00 PM

Miek Gieben

User management in fksd

If you do DNS for too long everything looks like 53.

In this "trace" I'm showing the logging of fksd when I add a zone, try to list it as a non-existent user miekg (which fails), add the user miekg and list it again. User are identified by the key in the TSIG record, their password is the shared secret.

The "config files" from nsupdate can be found in the github repo of fksd. The nsupdate commands are preceded with a %, extra comments are preceded with #:

./fksd -log
# add a zone as the superuser (defaults to root)
% nsupdate -vd addzone
2012/08/07 21:48:31 fksd: config command
2012/08/07 21:48:31 fksd: config command ok
2012/08/07 21:48:31 fksd: config: READ miek.nl.  /home/miekg/g/src/dns/ex/fksd/z/miek.nl.db
2012/08/07 21:48:31 fksd: config: added: READ miek.nl.  /home/miekg/g/src/dns/ex/fksd/z/miek.nl.db

# list the zones in the server as the user miekg (this fails)
% nsupdate -vd listzone-miekg
2012/08/07 21:48:35 fksd: config command
2012/08/07 21:48:35 fksd: non config command (tsig fail): dns: bad signature

# add the user miekg (only the superuser may do this)
% nsupdate -vd adduser-miekg
2012/08/07 21:48:39 fksd: config command
2012/08/07 21:48:39 fksd: config command ok
2012/08/07 21:48:39 fksd: config: ADD miekg. with bWlla2c=

# list the current users 
% nsupdate -vd listuser
2012/08/07 21:48:43 fksd: config command
2012/08/07 21:48:43 fksd: config command ok
2012/08/07 21:48:43 fksd: config: USER root.: c3R1cGlk
2012/08/07 21:48:43 fksd: config: USER miekg.: bWlla2c=

# Again, list the zones as the user miekg, now it works
% nsupdate -vd listzone-miekg
2012/08/07 21:48:51 fksd: config command
2012/08/07 21:48:51 fksd: config command ok
2012/08/07 21:48:51 fksd: config: LIST

That last command now works, before we got a "dns: bad signature" error.

The user management will be kept simple. The superuser can do everything, other users can use: write, list or drop, but this is currently a (minor) to do.

by Miek Gieben at August 07, 2012 07:58 PM

Shadynasty Business

Painless Web Handlers in Go

Last time we made a little guestbook application, but there were a couple pain points. We had to have some boiler plate at the top of all of the handlers, and errors were handled by copying the same line of code everywhere. We also had fixed url paths hard coded in handlers and templates. Let’s see how we can fix that.

Adding context

A lot of the boiler plate in the handlers last time had to do with the database for each request, so let’s start by cleaning that up. How we do this is by creating a type that will have the context for the request.

<figure class="code"><figcaption>context.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
package main

import (
  "labix.org/v2/mgo"
  "net/http"
)

type Context struct {
  Database *mgo.Database
}

func (c *Context) Close() {
  c.Database.Session.Close()
}

func NewContext(req *http.Request) (*Context, error) {
  return &Context{
      Database: session.Clone().DB(database),
  }, nil
}
</figure>

A context is the general context the request will use to make decisions, bundled up with the handles to the resources it needs to perform actions. Right now we only have the database. Let’s change our handlers to use the new context.

<figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
func hello(w http.ResponseWriter, req *http.Request) {
  ctx, err := NewContext(req)
  if err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
      return
  }
  defer ctx.Close()

  //set up the collection and query
  coll := ctx.Database.C("entries")
  query := coll.Find(nil).Sort("-timestamp")

  //execute the query
  //TODO: add pagination :)
  var entries []Entry
  if err := query.All(&entries); err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
      return
  }

  //execute the template
  if err := index.Execute(w, entries); err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
      return
  }
}

func sign(w http.ResponseWriter, req *http.Request) {
  //make sure we got post
  if req.Method != "POST" {
      http.NotFound(w, req)
      return
  }

  entry := NewEntry()
  entry.Name = req.FormValue("name")
  entry.Message = req.FormValue("message")

  if entry.Name == "" {
      entry.Name = "Some dummy who forgot a name"
  }
  if entry.Message == "" {
      entry.Message = "Some dummy who forgot a message."
  }

  ctx, err := NewContext(req)
  if err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
      return
  }
  defer ctx.Close()

  coll := ctx.Database.C("entries")
  if err := coll.Insert(entry); err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
      return
  }

  http.Redirect(w, req, "/", http.StatusTemporaryRedirect)
}
</figure>

Now thats wonderful, but it looks like we just made it worse.

The magic of interfaces

To fix this, we’re going to create a new handler type, and give it a ServeHTTP method. This new handler type will handle creating/closing the context, and handling any errors that arise. Here’s the definition:

<figure class="code"><figcaption>http.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
package main

import "net/http"

type handler func(http.ResponseWriter, *http.Request, *Context) error

func (h handler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
  //create the context
  ctx, err := NewContext(req)
  if err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
  }
  defer ctx.Close()

  //run the handler and grab the error, and report it
  err = h(w, req, ctx)
  if err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
  }
}
</figure>

The handler type is a function type, meaning any function with that signature can be cast to that type. We define a method on the function (I know!) so that the net/http package can use it as though it were any other handler. We’ve already been doing something very similar to this already. When we called the http.HandleFunc function in our main.go, we’ve been using our functions as the type http.HandlerFunc which defines a ServeHTTP method, just like ours. See, it’s not so bad. Here’s what the new handlers look like:

<figure class="code"><figcaption>handlers.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
package main

import "net/http"

func hello(w http.ResponseWriter, req *http.Request, ctx *Context) (err error) {
  //set up the collection and query
  coll := ctx.Database.C("entries")
  query := coll.Find(nil).Sort("-timestamp")

  //execute the query
  //TODO: add pagination :)
  var entries []Entry
  if err = query.All(&entries); err != nil {
      return
  }

  //execute the template
  err = index.Execute(w, entries)
  return
}

func sign(w http.ResponseWriter, req *http.Request, ctx *Context) (err error) {
  //make sure we got post
  if req.Method != "POST" {
      http.NotFound(w, req)
      return
  }

  entry := NewEntry()
  entry.Name = req.FormValue("name")
  entry.Message = req.FormValue("message")

  if entry.Name == "" {
      entry.Name = "Some dummy who forgot a name"
  }
  if entry.Message == "" {
      entry.Message = "Some dummy who forgot a message."
  }

  coll := ctx.Database.C("entries")
  if err = coll.Insert(entry); err != nil {
      return
  }

  http.Redirect(w, req, "/", http.StatusTemporaryRedirect)
  return
}
</figure>

Much better! Let’s commit that.

Routing

The other pain points, hard coded urls, and checking the request method, are going to be handled by more advanced routing. For this, we’re going to use the execllent gorilla web toolkit, specifically the gorilla/pat package. I really like the simple API it provides with easy parameter capturing from the url. It’s very easy to use with the net/http package:

<figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
func main() {
  var err error
  u := os.Getenv("DATABASE_URL")
  parsed, err := url.Parse(u)
  if err != nil {
      panic(err)
  }
  database = parsed.Path[1:]
  session, err = mgo.Dial(u)
  if err != nil {
      panic(err)
  }

  r := pat.New()
  r.Add("GET", "/", handler(hello)).Name("index")
  r.Add("POST", "/sign", handler(sign)).Name("sign")

  if err = http.ListenAndServe(":"+os.Getenv("PORT"), r); err != nil {
      panic(err)
  }
}
</figure>

One important and easy to miss detail is we now pass the router in as the second argument to the http.ListenAndServe call. Now we can remove the check that the method is POST in the sign handler, as the router takes care of that for us. Lets move on to fixing the hard coded entries.

Reversing URLs

If you’ll notice, we gave the handlers a .Name call. The gorilla/pat package returns a *mux.Router for us to work with. Using that we can have the router rebuild urls from the names. For example, if we wanted to grab the url for the index page, we could use

r.GetRoute("index").URL()

but since r is inaccessable outside the main function, we have to move it into a higher scope. Let’s do that.

<figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
var router *pat.Router

func main() {
  //...

  router = pat.New()
  router.Add("GET", "/", handler(hello)).Name("index")
  router.Add("POST", "/sign", handler(sign)).Name("sign")

  if err = http.ListenAndServe(":"+os.Getenv("PORT"), router); err != nil {
      panic(err)
  }
}
</figure>

And now we can update the sign handler

<figure class="code"><figcaption>handlers.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
func sign(w http.ResponseWriter, req *http.Request, ctx *Context) (err error) {
  //...

  url, err := router.GetRoute("index").URL()
  if err != nil {
      return
  }

  http.Redirect(w, req, url, http.StatusTemporaryRedirect)
  return
}
</figure>

Reversing in Templates

To reverse inside the template, we could either remember to pass the router in as part of the template context on every invocation, or we could add a function to the template. Since keeping track of the router through nested templates and scope changes is a daunting task, adding a function to do the reversing is a better option. Heres that function:

<figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
func reverse(name string, things ...interface{}) string {
  //convert the things to strings
  strs := make([]string, len(things))
  for i, th := range things {
      strs[i] = fmt.Sprint(th)
  }
  //grab the route
  u, err := router.GetRoute(name).URL(strs...)
  if err != nil {
      panic(err)
  }
  return u.Path
}
</figure>

We choose to have the function panic on errors because any incorrect reversal is a programmer error. We also accept a variadic number of interface values because sometimes we need to have a parameter in the reversal that is an integer, like the year on the blog post url, and the URL function takes strings. So rather than force the template to do the conversion, or the function executing the template, we just convert everything to a string by calling fmt.Sprint on it. Then we have to add this function to the template.

<figure class="code"><figcaption>main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
var funcs = template.FuncMap{
  "reverse": reverse,
}

func parseTemplate(files ...string) *template.Template {
  //create a new template named after the first file in the list and add
  //the function map to it
  name := filepath.Base(files[0])
  t := template.New(name).Funcs(funcs)

  //parse the files into the template and panic on errors
  t = template.Must(t.ParseFiles(files...))
  return t
}

var index = parseTemplate(
  "templates/_base.html",
  "templates/index.html",
)
</figure>

Theres a tricky point here: the template package will error when trying to parse a template and it finds a function invocation to something undefined. That means we have to add our function map to the template before we add the files to parse. We write a little helper function to do this correctly. Now we can update the template to use it.

<form action="{{ reverse "sign" }}" method="POST">

Let’s update the sign handler to use the reverse function too.

http.Redirect(w, req, reverse("index"), http.StatusSeeOther)

Pain: consider yourself eliminated.

Next up, we’re going to do more with the context type we created, and make the guestbook a little more web 2.0. As always, the source to the gostbook is up on github.

August 07, 2012 06:13 PM

August 06, 2012

Miek Gieben

Dynamic nameserver provisioning with dns pkts

I'm writing a nameserver called fksd (Funkensturm daemon), which is currently in a prototype stage (but the code is available at github).

In this server I'm pursuing some interesting directions in nameserver development, such as the dynamic configuration as provided by BIND10.

BIND10 uses http(s), but I think using DNS packets is more in line with a nameserver, so I opted for that route.

With fksd you can use packets (which will be TSIG signed in the future tomorrow) to configure the server. The only configuration possible at the moment is adding a zone. Such a packet needs to have a TXT record like the following in its AUTHORITY SECTION:

ZONE.   IN  TXT "READ miek.nl. /path/to/zone"

Using the AUTH. section means we can re-use nsupdate (#win).

The current dev. version of fksd listens on port 1053 for real dns queries and on 8053 for configuration queries. Lets start the daemon and query for miek.nl MX:

$ ./fksd -log
<in other terminal>
$ dig @127.0.0.1 -p 1053 mx miek.nl
...
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1945
...

Indeed, SERVFAIL, because miek.nl. isn't loaded. Lets fix that (-vD is crucial otherwise it won't work for some reason):

$ nsupdate -vD
> server 127.0.0.1 8053
> zone ZONE.
> update add ZONE. 60 IN TXT "READ miek.nl /home/miekg/g/src/dns/ex/fksd/z/miek.nl.db"
> send
; Communication with server failed: timed out

That last error is because I'm lame and do not send a reply message (will be done in the future). Meanwhile fksd logs:

2012/08/06 23:13:27 fksd: config commmand
2012/08/06 23:13:27 fksd: config: READ miek.nl. /home/miekg/g/src/dns/ex/fksd/z/miek.nl.db

When I now query for miek.nl MX, I get:

$ dig @127.0.0.1 -p 1053 mx miek.nl
...
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31060
...
;; ANSWER SECTION:
miek.nl.                345600  IN      MX      20 mail.atoom.net.
miek.nl.                345600  IN      MX      40 mx-ext.tjeb.nl.

;; AUTHORITY SECTION:
miek.nl.                345600  IN      NS      ext.ns.whyscream.net.
miek.nl.                345600  IN      NS      open.nlnetlabs.nl.
miek.nl.                345600  IN      NS      omval.tednet.nl.
miek.nl.                345600  IN      NS      elektron.atoom.net.
...

The config will be put in some kind of journal in json format (just like BIND10...), which is also "a future todo"(TM). But for now: this seems to work very nice - now the only thing left is to implement the rest of this authoritative nameserver.

by Miek Gieben at August 06, 2012 09:18 PM

August 05, 2012

Hacking with the Go Programming Language (GoLang)

Syntax highlighting for golang

I've been using Alex Gorbatchev's SyntaxHighlighter to ease the task of presenting code on this blog. Seeing that there's currently no "brush" for golang, I decided to create one. I posted the code here for anyone interested. To use it, simply put the following line where you'd normally do it for SyntaxHighlighter brushes (usually before the <head> tag):
<script src='https://raw.github.com/axx/GolangHighlighter/master/shBrushGo.js' type='text/javascript'/>
Then use either "go" or "golang" as alias when you insert golang code on your blog:
<pre class="brush: go">package main
import "fmt"
func main() {
fmt.Printf("yoohoo!")
}</pre>
This will be rendered as:
package main
import "fmt"
func main() {
fmt.Printf("yoohoo!")
}
I hope someone else finds it useful. Please drop a comment if you have some suggestions on how to improve it.
EDIT (05AUG2012): I've lost my old server so I had to reconstruct the script and placed it on GitHub. I therefore updated the example and links above.

by Allister (noreply@blogger.com) at August 05, 2012 05:32 PM

July 30, 2012

Shadynasty Business

Quick and Clean in Go

After reading a neat article whose title I stole about making a guestbook app in Flask, I decided to see how it would compare to my favorite language of the year, Go. So here’s my take.

First Steps

Let’s create a new directory to hold the project. I’m gonna host the code on github so let’s make the local directory match the import path.

$ cd ~/Code/go/src
$ mkdir -p github.com/zeebo/gostbook
$ cd github.com/zeebo/gostbook/
$ git init
Initialized empty Git repository in /Users/zeebo/Code/go/src/github.com/zeebo/gostbook/.git/

Note that ~/Code/go is a directory in my GOPATH environment variable, the only piece of configuration I need to do to have the build tool know how to fetch and build any code that uses these conventions. Lets put in a little hello world code.

<figure class="code"><figcaption>Hello World - main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
package main

import (
  "fmt"
  "net/http"
)

func hello(w http.ResponseWriter, req *http.Request) {
  fmt.Fprintln(w, "Hello World!")
}

func main() {
  http.HandleFunc("/", hello)
  if err := http.ListenAndServe(":8080", nil); err != nil {
      panic(err)
  }
}
</figure>

This registers a handler that will match any path and write Hello World! in the response. Building and running this code runs a server that listens on port 8080, so lets visit it.

$ go build
$ ./gostbook &
[1] 39629
$ curl localhost:8080
Hello World!
$ kill 39629

Neat!

Commit

Let’s do our source control duty, and make a commit with our super simple app.

$ cat .gitignore 
*
!.gitignore
!*.go
!*.html
$ git status
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   .gitignore
#   main.go
nothing added to commit but untracked files present (use "git add" to track)
$ git add .
$ git commit -m 'initial commit'
[master (root-commit) de0b184] initial commit
 2 files changed, 21 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 main.go

Templates

The next step is to put templates in. Lets make a template directory and some basic templates in there. I’ll steal the templates from Eevee’s post and change them to use the built in html/template package from the standard library. Here’s the source:

<figure class="code"><figcaption>templates/_base.html </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
<!DOCTYPE html>
<html lang="en">
    <head>
        <title>{{ template "title" . }}</title>
    </head>
    <body>
        <section id="contents">
            {{ template "content" . }}
        </section>
        <footer id="footer">
            My Cool Guestbook 2000 © me forever
        </footer>
    </body>
</html>
</figure> <figure class="code"><figcaption>templates/index.html </figcaption>
1
2
3
4
5
6
7
8
9
10
11
{{ define "title" }}Guestbook{{ end }}

{{ define "content" }}
    <h1>Guestbook</h1>

    <p>Hello, and welcome to my guestbook, because it's 1997!</p>

    <ul class="guests">
        <li>...</li>
    </ul>
{{ end }}
</figure>

Updating the Go code is a little more work, but not much.

<figure class="code"><figcaption>Template World - main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import (
  "html/template"
  "net/http"
)

var index = template.Must(template.ParseFiles(
  "templates/_base.html",
  "templates/index.html",
))

func hello(w http.ResponseWriter, req *http.Request) {
  if err := index.Execute(w, nil); err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
  }
}
</figure>

Building and running again, we see it’s working:

$ go build
$ ./gostbook &
[1] 39918
$ curl localhost:8080
<!DOCTYPE html>
<html lang="en">
    <head>
        <title>Guestbook</title>
    </head>
    <body>
        <section id="content">

    <h1>Guestbook</h1>

    <p>Hello, and welcome to my guestbook, because it's 1997!</p>

    <ul class="guests">
        <li>...</li>
    </ul>

        </section>
        <footer id="footer">
            My Cool Guestbook 2000 © me forever
        </footer>
    </body>
</html>
$ kill 39918

Let’s be diligent and make another commit. On to data!

Databases

Go has many database bindings but the one I find easiest to work with would be MongoDB with the excellent mgo driver. Let’s create our data model.

<figure class="code"><figcaption>Database entry - entry.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
package main

import (
  "labix.org/v2/mgo/bson"
  "time"
)

type Entry struct {
  ID        bson.ObjectId `bson:"_id,omitempty"`
  Timestamp time.Time
  Name      string
  Message   string
}

func NewEntry() *Entry {
  return &Entry{
      Timestamp: time.Now(),
  }
}
</figure>

We just create a struct with some fields. The mgo driver uses runtime reflection to look up the information about the struct for setting and reading the values. For the ID field add some tags to it to instruct bson to omit it if the value is empty, and name it _id when serializing, to have MongoDB pick the id for us on insertion, and name it what it’s expecting. We also provide a NewEntry function for creating an Entry at the current time.

Now lets add support to the handler.

<figure class="code"><figcaption>Databased up - main.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
func hello(w http.ResponseWriter, req *http.Request) {
  //grab a clone of the session and close it when the
  //function returns
  s := session.Clone()
  defer s.Close()

  //set up the collection and query
  coll := s.DB("gostbook").C("entries")
  query := coll.Find(nil).Sort("-timestamp")

  //execute the query
  //TODO: add pagination :)
  var entries []Entry
  if err := query.All(&entries); err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
      return
  }

  //execute the template
  if err := index.Execute(w, entries); err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
      return
  }
}

var session *mgo.Session

func main() {
  var err error
  session, err = mgo.Dial("localhost")
  if err != nil {
      panic(err)
  }

  http.HandleFunc("/", hello)
  if err = http.ListenAndServe(":8080", nil); err != nil {
      panic(err)
  }
}
</figure>

Interacting with the databse requires a little boilerplate in the handler, but this can easily be removed by clever use of Go’s interfaces. The net/http package will serve anything with a ServeHTTP(ResponseWriter, *Request) method, so you can decorate handlers by wrapping them in simple types that implement that interface. Doing that is left as an exercise to the reader :)

Here’s how we change the template:

<figure class="code"><figcaption>templates/index.html </figcaption>
1
2
3
4
5
6
7
8
    <ul class="guests">
        {{ range . }}
        <li>
            <blockquote>{{ .Message }}</blockquote>
            <p>- <cite>{{ .Name }}</cite>, <time>{{ .Timestamp }}</time></p>
        </li>
        {{ end }}
    </ul>
</figure>

Notice we don’t worry about any kind of injection. The html/template package is super awesome and handles that by knowing what it’s outputing and the context in which the data is being used. If you’re in an html context, it will escape the html properly. If you’re in a script or url context, it knows and will apply the appropriate esacping. No modifying the data in the database. No “sanitizing”. Just doing the right thing, every time.

Signing it

Time to add the handler to sign the guest book. Let’s start with the html for the form.

<figure class="code"><figcaption>templates/index.html </figcaption>
1
2
3
4
5
6
7
    <hr>

    <form action="/sign" method="POST">
        <p>Name: <input type="text" name="name"></p>
        <p>Message: <textarea name="message" rows="10" cols="40"></textarea></p>
        <p><button>Sign</button></p>
    </form>
</figure>

And now the handler:

<figure class="code"><figcaption>sign.go </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package main

import "net/http"

func sign(w http.ResponseWriter, req *http.Request) {
  //make sure we got post
  if req.Method != "POST" {
      http.NotFound(w, req)
      return
  }

  entry := NewEntry()
  entry.Name = req.FormValue("name")
  entry.Message = req.FormValue("message")

  if entry.Name == "" {
      entry.Name = "Some dummy who forgot a name"
  }
  if entry.Message == "" {
      entry.Message = "Some dummy who forgot a message."
  }

  s := session.Clone()
  defer s.Close()

  coll := s.DB("gostbook").C("entries")
  if err := coll.Insert(entry); err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
      return
  }

  http.Redirect(w, req, "/", http.StatusTemporaryRedirect)
}
</figure>

All we need to do is add a single line to main.go to make it handle the new handler:

http.HandleFunc("/sign", sign)

And we can sign, and view our guestbook. Lets commit again.

Some issues

Now the astute reader will notice a couple little pain points.

  • We had to check in the sign handler if the method was POST. This can be fixed by using a more sophisticated muxer than the built in one in net/http. Like all good packages in Go, all of these things are just interfaces and so you can swap them out with many community driven packages. An exellent one is the gorilla muxer at code.google.com/p/gorilla/mux.

  • We had to hard code the urls. Once again, this is solved by using a more sophisticated muxer. code.google.com/p/gorilla/mux supports building urls from names you give to the routes.

  • Boilerplate in the handlers to specify a database/collection every time. I typically solve this how I wrote earlier by making a type that implements the ServeHTTP method and passes in a request context containing everything I need to use for that request, including sessions and database connections. It’s only a couple lines of code to make, but outside the scope of this post.

Other than that, I found it to be pretty painless and about as easy to do as the Flask version. Considering this is a statically typed compiled language, that’s quite the feat.

Deployment

It wouldn’t be useful if it wasn’t deployed. Fortunately, Go compiles down into a static binary. This can be shipped to any system that it was compiled for, and just ran. Go also allows you to easily cross compile for any system, so thats a non-issue as well. The built in web server is comparable in performance to things like Apache and nginx from my tests. So for most cases, it’s as simple as running a binary and either proxy passing it through from your front end server, or just letting the world hit it directly.

But, since that’s not cool enough, we’re also going to deploy on Heroku.

Buildpacks and a Note About Getting Code

Unfortunately, Go isn’t a supported platform on Heroku. Fortunately, it’s just a buildpack away. The Cedar stack is excellent and allows you to run any binary you want to host your web site, so we just have to tell Heroku how to build our code. I’m a little biased so I’m going to use the buildpack I modified to do this, although there are alternatives.

The cool part about hosting our code on github is that anyone with Go installed can just grab it with a single command:

go get github.com/zeebo/gostbook

That will download, compile, and install a binary named “gostbook” in our bin directory in our GOPATH. The buildpack I created uses this to build the code we’ll be deploying. First we make a little file that describes how to do it, and a Procfile to describe what to run:

<figure class="code"><figcaption>.heroku </figcaption>
1
2
BASE=github.com/zeebo/gostbook
+ github.com/zeebo/gostbook
</figure> <figure class="code"><figcaption>Procfile </figcaption>
1
web: bin/gostbook
</figure>

Then we have to be nice and listen on the port Heroku tells us to. This is a one line change:

if err = http.ListenAndServe(":"+os.Getenv("PORT"), nil); err != nil {

Lastly, we have to dail out to the mongo config they ask too:

session, err = mgo.Dial(os.Getenv("DATABASE_URL"))

I use DATABASE_URL as the key. We’ll have to set it later in the deployment. Let’s commit that.

Deployment (again)

Lets create the heroku app.

$ heroku create --stack cedar --buildpack http://github.com/zeebo/buildpack.git
Creating tranquil-refuge-9104... done, stack is cedar
http://tranquil-refuge-9104.herokuapp.com/ | git@heroku.com:tranquil-refuge-9104.git
Git remote heroku added

Add in a free mongo database and configure the DATABASE_URL:

$ heroku addons:add mongolab:starter
-----> Adding mongolab:starter to tranquil-refuge-9104... done, v3 (free)
       Welcome to MongoLab.
$ heroku config
BUILDPACK_URL => http://github.com/zeebo/buildpack.git
MONGOLAB_URI  => ...snip...
$ heroku config:add DATABASE_URL=...snip...
Adding config vars and restarting app... done, v4
  DATABASE_URL => ...snip...

If I was smarter, I would have just used MONGOLAB_URI in the code, but I’m not so here we are. Finally, we can just push it up and watch the magic:

$ git push heroku master
Counting objects: 24, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (21/21), done.
Writing objects: 100% (24/24), 3.41 KiB, done.
Total 24 (delta 4), reused 0 (delta 0)

-----> Heroku receiving push
-----> Fetching custom buildpack... done
-----> Go app detected
-----> Configuration
       GO_VERSION=go1.0.2
       BASE=github.com/zeebo/gostbook
       + github.com/zeebo/gostbook
-----> Using Go go1.0.2.linux-amd64
-----> Fetching Go go1.0.2.linux-amd64
-----> Checking for Mercurial and Bazaar
       Fetching hg and bzr
       ..snip...
       Successfully installed mercurial
       ...snip...
       Successfully installed bzr
       Cleaning up...
-----> Running go get -u -v all
-----> Copying sources into GOPATH/src/github.com/zeebo/gostbook
-----> Running go get -v github.com/zeebo/gostbook
       Fetching https://labix.org/v2/mgo?go-get=1
       Parsing meta tags from https://labix.org/v2/mgo?go-get=1 (status code 200)
       get "labix.org/v2/mgo": found meta tag main.metaImport{Prefix:"labix.org/v2/mgo", VCS:"bzr", RepoRoot:"https://launchpad.net/mgo/v2"} at https://labix.org/v2/mgo?go-get=1
       labix.org/v2/mgo (download)
       Fetching https://labix.org/v2/mgo/bson?go-get=1
       Parsing meta tags from https://labix.org/v2/mgo/bson?go-get=1 (status code 200)
       get "labix.org/v2/mgo/bson": found meta tag main.metaImport{Prefix:"labix.org/v2/mgo", VCS:"bzr", RepoRoot:"https://launchpad.net/mgo/v2"} at https://labix.org/v2/mgo/bson?go-get=1
       get "labix.org/v2/mgo/bson": verifying non-authoritative meta tag
       Fetching https://labix.org/v2/mgo?go-get=1
       Parsing meta tags from https://labix.org/v2/mgo?go-get=1 (status code 200)
       labix.org/v2/mgo/bson
       labix.org/v2/mgo
       github.com/zeebo/gostbook
-----> Discovering process types
       Procfile declares types -> web
-----> Compiled slug size is 1.4MB
-----> Launching... done, v6
       http://tranquil-refuge-9104.herokuapp.com deployed to Heroku

To git@heroku.com:tranquil-refuge-9104.git
 * [new branch]      master -> master

And we have a nice guestbook at http://tranquil-refuge-9104.herokuapp.com

A snag

It seems like the database name is specified by the host in this case. We can’t just go and create whatever database we want. So we have to update the code to grab this information and use it when we’re making queries. The patch to fix it was pretty easy. Just add a global variable and parse the URL to put the database into it.

<figure class="code"><figcaption>commit.diff </figcaption>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
diff --git a/main.go b/main.go
index 6094df8..eea1565 100644
--- a/main.go
+++ b/main.go
@@ -4,6 +4,7 @@ import (
  "html/template"
  "labix.org/v2/mgo"
  "net/http"
+    "net/url"
  "os"
 )

@@ -19,7 +20,7 @@ func hello(w http.ResponseWriter, req *http.Request) {
  defer s.Close()

  //set up the collection and query
-    coll := s.DB("gostbook").C("entries")
+    coll := s.DB(database).C("entries")
  query := coll.Find(nil).Sort("-timestamp")

  //execute the query
@@ -38,10 +39,17 @@ func hello(w http.ResponseWriter, req *http.Request) {
 }

 var session *mgo.Session
+var database string

 func main() {
  var err error
-    session, err = mgo.Dial(os.Getenv("DATABASE_URL"))
+    u := os.Getenv("DATABASE_URL")
+    parsed, err := url.Parse(u)
+    if err != nil {
+        panic(err)
+    }
+    database = parsed.Path[1:]
+    session, err = mgo.Dial(u)
  if err != nil {
      panic(err)
  }
diff --git a/sign.go b/sign.go
index a5b6cd0..c3ddbda 100644
--- a/sign.go
+++ b/sign.go
@@ -23,7 +23,7 @@ func sign(w http.ResponseWriter, req *http.Request) {
  s := session.Clone()
  defer s.Close()

-    coll := s.DB("gostbook").C("entries")
+    coll := s.DB(database).C("entries")
  if err := coll.Insert(entry); err != nil {
      http.Error(w, err.Error(), http.StatusInternalServerError)
      return
</figure>

We just rely on the net/url package to parse the url and grab the database out of the path argument. Since the path contains the leading forward slash, we just slice that off. All thats left is a redeploy:

$ git add .
$ git commit -m 'fixes for database'
[master 2b4bf78] fixes for database
 2 files changed, 11 insertions(+), 3 deletions(-)
$ git push heroku master
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 493 bytes, done.
Total 4 (delta 3), reused 0 (delta 0)

-----> Heroku receiving push
-----> Fetching custom buildpack... done
-----> Go app detected
-----> Configuration
       GO_VERSION=go1.0.2
       BASE=github.com/zeebo/gostbook
       + github.com/zeebo/gostbook
-----> Using Go go1.0.2.linux-amd64
-----> Checking for Mercurial and Bazaar
       /app/tmp/repo.git/.cache/venv/bin/hg
       /app/tmp/repo.git/.cache/venv/bin/bzr
-----> Running go get -u -v all
-----> Copying sources into GOPATH/src/github.com/zeebo/gostbook
-----> Running go get -v github.com/zeebo/gostbook
       github.com/zeebo/gostbook
-----> Discovering process types
       Procfile declares types -> web
-----> Compiled slug size is 1.4MB
-----> Launching... done, v7
       http://tranquil-refuge-9104.herokuapp.com deployed to Heroku

To git@heroku.com:tranquil-refuge-9104.git
   52a2171..2b4bf78  master -> master

And to my surprise, it worked on the second try!

Closing remarks

I hope this post showed some of what can be done with Go. In little time and code I was able to construct that awesome 1997 guestbook. This just scratched the surface of the cool stuff going on in the Go ecosystem. There’s code competion, sublime text integration, hosted automatically generated documentation, and continuous integration. The Go tool is awesome and able to build the vast majority of Go code that lives anywhere with one command. I highly recommend looking into Go for your next project.

July 30, 2012 03:43 PM

July 20, 2012

Adam Langley

SSL interstitial bypass rates

In yesterday's post I threw in the following that I've been asked to clarify:

“we know that those bypass buttons are clicked 60% of the time by Chrome users”

Chrome collects anonymous statistics from users who opt in to such collection (and thank you to those who do!). One of those statistics covers how frequently people bypass SSL interstitials. (As always, the Chrome privacy policy has the details.)

We define bypassing the interstitial as clicking the ‘Proceed anyway’ button and not bypassing as either closing the tab, navigating elsewhere, or clicking the ‘Back’ button.

I picked five days at random over the past six weeks and averaged the percentages of the time that users bypassed rather than not. That came to 61.6%.

There may be some biases here: we may have a biased population because we only include users who have opted in to statistics collection. We are also counting all interstitals: there may be a small number of users who bypass a lot of SSL errors. But that's the data that we have.

July 20, 2012 07:00 AM

July 19, 2012

Adam Langley

Living with HTTPS

(These are my notes from the first half of my talk at HOPE9 last weekend. I write notes like these not as a script, but so that I have at least some words ready in my head when I'm speaking. They are more conversational and less organised than a usual blog post, so please forgive me the rough edges.)

HTTPS tends to cause people to give talks mocking certificate security and the ecosystem around it. Perhaps that's well deserved, but that's not what this talk is about. If you want to have fun at the expense of CAs, dig up one of Moxie's talks. This talk deals with the fact that your HTTPS site, and the sites that you use, probably don't even reach the level where you get to start worrying about certificates.

I'm a transport security person so the model for this talk is that we have two computers talking over a malicious network. We assume that the computers themselves are honest and uncompromised. That might be a stretch in these malware-ridden times, but that's the area of host security and I'm not talking about that today. The network can drop, alter or fabricate packets at will. As a lemma, we also assume that the network can cause the browser to load any URL it wishes. The network can do this by inserting HTML into any HTTP request and we assume that every user makes some unencrypted requests while browsing.

Stripping

If the average user typed mail.google.com into a browser and saw the following, what fraction of them do you think would login, none the wiser?

Can you even see what's terribly wrong here?

The problem is that the page isn't served over HTTPS. It should have been, but when a user types a hostname into a browser, the default scheme is HTTP. The server may attempt to redirect users to HTTPS, but that redirect is insecure: a MITM attacker can rewrite it and keep the user on HTTP, spoofing the real site the whole time. The attacker can now intercept all the traffic to this perfectly well configured and secure website.

This is called SSL stripping and it's terribly simple and devastatingly effective. We probably don't see it very often because it's not something that corporate proxies need to do, so it's not in off-the-shelf devices. But that respite is unlikely to last very long and maybe it's already over: how would we even know if it was being used?

In order to stop SSL stripping, we need to make HTTPS the only protocol. We can't do that for the whole Internet, but we can do it site-by-site with HTTP Strict Transport Security (HSTS).

HSTS tells browsers to always make requests over HTTPS to HSTS sites. Sites become HSTS either by being built into the browser, or by advertising a header:

Strict-Transport-Security: max-age=8640000; includeSubDomains

The header is in force for the given number of seconds and may also apply to all subdomains. The header must be received over a clean HTTPS connection.

Once the browser knows that a site is HTTPS only, the user typing mail.google.com is safe: the initial request uses HTTPS and there's no hole for an attacker to exploit.

(mail.google.com and a number of other sites are already built into Chrome as HSTS sites so it's not actually possible to access accounts.google.com over HTTP with Chrome - I had to doctor that image! If you want to be included in Chrome's built-in HSTS list, email me.)

HSTS can also protect you, the webmaster, from making silly mistakes. Let's assume that you've told your mother that she should always type https:// before going to her banking site or maybe you setup a bookmark for her. That's honestly more than we can, or should, expect of our users. But let's say that our supererogatory user enters https://www.citibank.com in order to securely connect to her bank. What happens? Well, https://www.citibank.com redirects her to http://www.citibank.com. They've downgraded the user! From there, the HTTP site should redirect back to HTTPS, but the damage has been done. An attacker can get in through the hole.

I'm honestly not picking on Citibank here. They were simply the second site that I tried and I was some surprised that the first site didn't have the problem. It's a very easy mistake to make, and everything just works! It's a completely silent disaster! But HSTS would have either prevented it, or would have failed closed.

HSTS also does something else. It turns this:

Into this:

The “bypass this certificate error” button has gone. That button is a UI disaster. Asking regular people to evaluate the validity of X.509 certificates is insane. It's a security cop-out that we're saddled with, and which is causing real damage.

We've seen widespread MITM attacks using invalid certificates in Syria and, in recent weeks, Jordan. These attacks are invalid! This is squarely within our threat model for transport security and it shouldn't have been a risk for anybody. But we know that those bypass buttons are clicked 60% of the time by Chrome users. People are certainly habituated to clicking them and I bet that a large number of people were victims of attacks that we should have been able to prevent.

If you take only one thing away from this talk, HSTS should be it.

Mixed scripting

One we're sorted HSTS we have another problem. These snippets of HTML are gaping wide holes in your security:

<script src="http://...

<link href="http://...

<embed src="http://...

It's called mixed scripting and it happens when a secure site loads critical sub-resources over HTTP. It's a subset of mixed content: mixed content covers loading any sub-resource insecurely. Mixed content is bad, but when the resource is Javascript, CSS, or a plugin we give is another name to make it clear that its a lot more damaging.

When you load sub-resources over HTTP, an attacker can replace them with content of their choosing. The attacker also gets to choose any page on your HTTPS site with the problem. That includes pages that you don't expect to be served over HTTPS, but happen to be mapped. If you have this problem anywhere, on any HTTPS page, the attacker wins.

With complex sites, it's very difficult to ensure that this doesn't happen. One good way to limit it is to only serve over HTTPS so that there aren't any pages that you expect to serve over HTTP. Also, HSTS might also save you if you're loading mixed script from the same domain.

Another mitigation is to use scheme-relative URLs everywhere possible. These URLs look like //example.com/bar.js and are valid in all browsers. They inherit the scheme of the parent page, which will be HTTP or HTTPS as needed. (Although it does mean that if you load the page from disk then things will break. The scheme will be file:// in that case.)

Fundamentally this is such an easy mistake to make, and such a problem, that the only long term solution is for browsers to stop loading insecure Javascript, CSS and plugins for HTTPS sites. To their credit, IE9 does this and did it before Chrome. But I'm glad to say that Chrome has caught up and mixed scripting is now blocked by default, although with a user-override:

Yes, there's another of those damm bypass buttons. But we swear that it's just a transitional strategy and it already stops silent exploitation in a hidden iframe.

Cookies

HTTP and HTTPS cookie jars are the same. No really: cookies aren't scoped to a protocol! That means that if you set a Cookie on https://example.com and then make a request to http://example.com, the cookies will be sent in the clear! In order to prevent this, you should be setting secure on your HTTPS cookies. Sadly this is a very easy thing to miss because everything will still work if you omit it, but without the secure tag, attackers can easily steal your cookies.

It's worth noting that HSTS can protect you from this: by preventing the HTTP request from being sent, the cookies can't be leaked that way, but you'll need to include all subdomains in the HSTS coverage.

There's a second corollary to this: attackers can set your HTTPS cookies to. By causing an request to be sent to http://example.com, they can spoof a reply with cookies that then override any existing cookies. In this fashion, an attacker can log a user in as themselves during their interaction with an HTTPS site. Then, say, emails that they send will be saved in the attacker's out-box.

There's no very good protection against this except HSTS again. By preventing any HTTP requests you can stop the attacker from spoofing a reply to set the cookies. Against, HSTS needs to cover all subdomains in order to be effective against this.

Get yourself checked out

You should go to https://www.ssllabs.com and run their scan against your site. It's very good, but ignore it if it complains about the BEAST attack. It'll do so if you make a CBC ciphersuite your top preference. Browsers have worked around BEAST and non-browser clients are very unlikely to provide the attacker enough access to be able to pull it off. You have a limited amount of resources to address HTTPS issues and I don't think BEAST should make the list.

Get a real certificate

You should get a real certificate. You probably already have one but, if you don't, then you're just training more people to ignore certificate errors and you can't have HSTS without a real certificate. StartSSL give them away for free. Get one.

If you've reached this far and have done all of the above, congratulations: you're in the top 0.1% of HTTPS sites. If you're getting bored, this is a reasonable point to stop reading: everything else is just bonus points from now on.

Forward secrecy

You should consider forward secrecy. Forward secrecy means that the keys for a connection aren't stored on disk. You might have limited the amount of information that you log in order to protect the privacy of your users, but if you don't have forward secrecy then your private key is capable of decrypting all past connections. Someone else might be doing the logging for you.

In order to enable forward secrecy you need to have DHE or ECDHE ciphersuites as your top preference. DHE ciphersuites are somewhat expensive if you're terminating lots of SSL connections and you should be aware that your server will probably only allow 1024-bit DHE. I think 1024-bit DHE-RSA is preferable to 2048-bit RSA, but opinions vary. If you're using ECDHE, use P-256.

You also need to be aware of Session Tickets in order to implement forward secrecy correctly. There are two ways to resume a TLS connection: either the server chooses a random number and both sides store the session information, of the server can encrypt the session information with a secret, local key and send that to the client. The former is called Session IDs and the latter is called Session Tickets.

But Session Tickets are transmitted over the wire and so the server's Session Ticket encryption key is capable of decrypting past connections. Most servers will generate a random Session Ticket key at startup unless otherwise configured, but you should check.

I'm not going to take the time to detail how to configure this here. There are lots of webservers and it would take a while. This is more of a pointer so that you can go away and research it if you wish.

(The rest of the talk touched on OpenSSL speed, public key pinning, TLS 1.1 and 1.2 and a few other matters, but I did those bits mostly on the fly and don't have notes for them.)

July 19, 2012 07:00 AM

July 11, 2012

Go's official blog

Gccgo in GCC 4.7.1

The Go language has always been defined by a spec, not an implementation. The Go team has written two different compilers that implement that spec: gc and gccgo. Having two different implementations helps ensure that the spec is complete and correct: when the compilers disagree, we fix the spec, and change one or both compilers accordingly. Gc is the original compiler, and the go tool uses it by default. Gccgo is a different implementation with a different focus, and in this post we’ll take a closer look at it.

Gccgo is distributed as part of GCC, the GNU Compiler Collection. GCC supports several different frontends for different languages; gccgo is a Go frontend connected to the GCC backend. The Go frontend is separate from the GCC project and is designed to be able to connect to other compiler backends, but currently only supports GCC.

Compared to gc, gccgo is slower to compile code but supports more powerful optimizations, so a CPU-bound program built by gccgo will usually run faster. All the optimizations implemented in GCC over the years are available, including inlining, loop optimizations, vectorization, instruction scheduling, and more. While it does not always produce better code, in some cases programs compiled with gccgo can run 30% faster.

The gc compiler supports only the most popular processors: x86 (32-bit and 64-bit) and ARM. Gccgo, however, supports all the processors that GCC supports. Not all those processors have been thoroughly tested for gccgo, but many have, including x86 (32-bit and 64-bit), SPARC, MIPS, PowerPC and even Alpha. Gccgo has also been tested on operating systems that the gc compiler does not support, notably Solaris.

Gccgo provides the standard, complete Go library. Many of the core features of the Go runtime are the same in both gccgo and gc, including the goroutine scheduler, channels, the memory allocator, and the garbage collector. Gccgo supports splitting goroutine stacks as the gc compiler does, but currently only on x86 (32-bit or 64-bit) and only when using the gold linker (on other processors, each goroutine will have a large stack, and a deep series of function calls may run past the end of the stack and crash the program).

Gccgo distributions do not yet include a version of the go command. However, if you install the go command from a standard Go release, it already supports gccgo via the -compiler option: go build -compiler gccgo myprog. The tools used for calls between Go and C/C++, cgo and SWIG, also support gccgo.

We have put the Go frontend under the same BSD license as the rest of the Go tools. You can download the source code for the frontend at the gofrontend Google Code project. Note that when the Go frontend is linked with the GCC backend to make gccgo, GCC’s GPL license takes precedence.

The latest release of GCC, 4.7.1, includes gccgo with support for Go 1. If you need better performance for CPU-bound Go programs, or you need to support processors or operating systems that the gc compiler does not support, gccgo might be the answer.

by Ian Lance Taylor

by Andrew Gerrand (noreply@blogger.com) at July 11, 2012 05:04 PM

July 08, 2012

embrace change

Set up an import path alias

Today I set up an alias for the import path of my Tideland packages. How to do so is described in the documentation of the go command. So I registered cgl.tideland.biz and added the needed index and other files per package. They don't contain very much, only the meta tag

<meta name="go-import" content="cgl.tideland.biz hg https://code.google.com/p/tcgl"/>

This way go get know that the real repository is hosted at Google Code. Beside the nice looking import path I can change the repository hoster if this is needed once.

Now I only have to change the import paths inside the code and push it.

by Frank Müller (noreply@blogger.com) at July 08, 2012 04:32 PM

July 04, 2012

Command Center

Less is exponentially more


Here is the text of the talk I gave at the Go SF meeting in June, 2012.

This is a personal talk. I do not speak for anyone else on the Go team here, although I want to acknowledge right up front that the team is what made and continues to make Go happen. I'd also like to thank the Go SF organizers for giving me the opportunity to talk to you.

I was asked a few weeks ago, "What was the biggest surprise you encountered rolling out Go?" I knew the answer instantly: Although we expected C++ programmers to see Go as an alternative, instead most Go programmers come from languages like Python and Ruby. Very few come from C++.

We—Ken, Robert and myself—were C++ programmers when we designed a new language to solve the problems that we thought needed to be solved for the kind of software we wrote. It seems almost paradoxical that other C++ programmers don't seem to care.

I'd like to talk today about what prompted us to create Go, and why the result should not have surprised us like this. I promise this will be more about Go than about C++, and that if you don't know C++ you'll be able to follow along.

The answer can be summarized like this: Do you think less is more, or less is less?

Here is a metaphor, in the form of a true story.  Bell Labs centers were originally assigned three-letter numbers: 111 for Physics Research, 127 for Computing Sciences Research, and so on. In the early 1980s a memo came around announcing that as our understanding of research had grown, it had become necessary to add another digit so we could better characterize our work. So our center became 1127. Ron Hardin joked, half-seriously, that if we really understood our world better, we could drop a digit and go down from 127 to just 27. Of course management didn't get the joke, nor were they expected to, but I think there's wisdom in it. Less can be more. The better you understand, the pithier you can be.

Keep that idea in mind.

Back around September 2007, I was doing some minor but central work on an enormous Google C++ program, one you've all interacted with, and my compilations were taking about 45 minutes on our huge distributed compile cluster. An announcement came around that there was going to be a talk presented by a couple of Google employees serving on the C++ standards committee. They were going to tell us what was coming in C++0x, as it was called at the time. (It's now known as C++11).

In the span of an hour at that talk we heard about something like 35 new features that were being planned. In fact there were many more, but only 35 were described in the talk. Some of the features were minor, of course, but the ones in the talk were at least significant enough to call out. Some were very subtle and hard to understand, like rvalue references, while others are especially C++-like, such as variadic templates, and some others are just crazy, like user-defined literals.

At this point I asked myself a question: Did the C++ committee really believe that was wrong with C++ was that it didn't have enough features? Surely, in a variant of Ron Hardin's joke, it would be a greater achievement to simplify the language rather than to add to it. Of course, that's ridiculous, but keep the idea in mind.

Just a few months before that C++ talk I had given a talk myself, which you can see on YouTube, about a toy concurrent language I had built way back in the 1980s. That language was called Newsqueak and of course it is a precursor to Go.

I gave that talk because there were ideas in Newsqueak that I missed in my work at Google and I had been thinking about them again.  I was convinced they would make it easier to write server code and Google could really benefit from that.

I actually tried and failed to find a way to bring the ideas to C++. It was too difficult to couple the concurrent operations with C++'s control structures, and in turn that made it too hard to see the real advantages. Plus C++ just made it all seem too cumbersome, although I admit I was never truly facile in the language. So I abandoned the idea.

But the C++0x talk got me thinking again.  One thing that really bothered me—and I think Ken and Robert as well—was the new C++ memory model with atomic types. It just felt wrong to put such a microscopically-defined set of details into an already over-burdened type system. It also seemed short-sighted, since it's likely that hardware will change significantly in the next decade and it would be unwise to couple the language too tightly to today's hardware.

We returned to our offices after the talk. I started another compilation, turned my chair around to face Robert, and started asking pointed questions. Before the compilation was done, we'd roped Ken in and had decided to do something. We did not want to be writing in C++ forever, and we—me especially—wanted to have concurrency at my fingertips when writing Google code. We also wanted to address the problem of "programming in the large" head on, about which more later.

We wrote on the white board a bunch of stuff that we wanted, desiderata if you will. We thought big, ignoring detailed syntax and semantics and focusing on the big picture.

I still have a fascinating mail thread from that week. Here are a couple of excerpts:

Robert: Starting point: C, fix some obvious flaws, remove crud, add a few missing features.

Rob: name: 'go'.  you can invent reasons for this name but it has nice properties. it's short, easy to type. tools: goc, gol, goa.  if there's an interactive debugger/interpreter it could just be called 'go'.  the suffix is .go.

Robert Empty interfaces: interface {}. These are implemented by all interfaces, and thus this could take the place of void*.

We didn't figure it all out right away. For instance, it took us over a year to figure out arrays and slices. But a significant amount of the flavor of the language emerged in that first couple of days.

Notice that Robert said C was the starting point, not C++. I'm not certain but I believe he meant C proper, especially because Ken was there. But it's also true that, in the end, we didn't really start from C. We built from scratch, borrowing only minor things like operators and brace brackets and a few common keywords. (And of course we also borrowed ideas from other languages we knew.) In any case, I see now that we reacted to C++ by going back down to basics, breaking it all down and starting over. We weren't trying to design a better C++, or even a better C. It was to be a better language overall for the kind of software we cared about.

In the end of course it came out quite different from either C or C++. More different even than many realize. I made a list of significant simplifications in Go over C and C++:

  • regular syntax (don't need a symbol table to parse)
  • garbage collection (only)
  • no header files
  • explicit dependencies
  • no circular dependencies
  • constants are just numbers
  • int and int32 are distinct types
  • letter case sets visibility
  • methods for any type (no classes)
  • no subtype inheritance (no subclasses)
  • package-level initialization and well-defined order of initialization
  • files compiled together in a package
  • package-level globals presented in any order
  • no arithmetic conversions (constants help)
  • interfaces are implicit (no "implements" declaration)
  • embedding (no promotion to superclass)
  • methods are declared as functions (no special location)
  • methods are just functions
  • interfaces are just methods (no data)
  • methods match by name only (not by type)
  • no constructors or destructors
  • postincrement and postdecrement are statements, not expressions
  • no preincrement or predecrement
  • assignment is not an expression
  • evaluation order defined in assignment, function call (no "sequence point")
  • no pointer arithmetic
  • memory is always zeroed
  • legal to take address of local variable
  • no "this" in methods
  • segmented stacks
  • no const or other type annotations
  • no templates
  • no exceptions
  • builtin string, slice, map
  • array bounds checking

And yet, with that long list of simplifications and missing pieces, Go is, I believe, more expressive than C or C++. Less can be more.

But you can't take out everything. You need building blocks such as an idea about how types behave, and syntax that works well in practice, and some ineffable thing that makes libraries interoperate well.

We also added some things that were not in C or C++, like slices and maps, composite literals, expressions at the top level of the file (which is a huge thing that mostly goes unremarked), reflection, garbage collection, and so on. Concurrency, too, naturally.

One thing that is conspicuously absent is of course a type hierarchy. Allow me to be rude about that for a minute.

Early in the rollout of Go I was told by someone that he could not imagine working in a language without generic types. As I have reported elsewhere, I found that an odd remark.

To be fair he was probably saying in his own way that he really liked what the STL does for him in C++. For the purpose of argument, though, let's take his claim at face value.

What it says is that he finds writing containers like lists of ints and maps of strings an unbearable burden. I find that an odd claim. I spend very little of my programming time struggling with those issues, even in languages without generic types.

But more important, what it says is that types are the way to lift that burden. Types. Not polymorphic functions or language primitives or helpers of other kinds, but types.

That's the detail that sticks with me.

Programmers who come to Go from C++ and Java miss the idea of programming with types, particularly inheritance and subclassing and all that. Perhaps I'm a philistine about types but I've never found that model particularly expressive.

My late friend Alain Fournier once told me that he considered the lowest form of academic work to be taxonomy. And you know what? Type hierarchies are just taxonomy. You need to decide what piece goes in what box, every type's parent, whether A inherits from B or B from A.  Is a sortable array an array that sorts or a sorter represented by an array? If you believe that types address all design issues you must make that decision.

I believe that's a preposterous way to think about programming. What matters isn't the ancestor relations between things but what they can do for you.

That, of course, is where interfaces come into Go. But they're part of a bigger picture, the true Go philosophy.

If C++ and Java are about type hierarchies and the taxonomy of types, Go is about composition.

Doug McIlroy, the eventual inventor of Unix pipes, wrote in 1964 (!):
We should have some ways of coupling programs like garden hose--screw in another segment when it becomes necessary to massage data in another way. This is the way of IO also.
That is the way of Go also. Go takes that idea and pushes it very far. It is a language of composition and coupling.

The obvious example is the way interfaces give us the composition of components. It doesn't matter what that thing is, if it implements method M I can just drop it in here.

Another important example is how concurrency gives us the composition of independently executing computations.

And there's even an unusual (and very simple) form of type composition: embedding.

These compositional techniques are what give Go its flavor, which is profoundly different from the flavor of C++ or Java programs.

===========

There's an unrelated aspect of Go's design I'd like to touch upon: Go was designed to help write big programs, written and maintained by big teams.

There's this idea about "programming in the large" and somehow C++ and Java own that domain. I believe that's just a historical accident, or perhaps an industrial accident. But the widely held belief is that it has something to do with object-oriented design.

I don't buy that at all. Big software needs methodology to be sure, but not nearly as much as it needs strong dependency management and clean interface abstraction and superb documentation tools, none of which is served well by C++ (although Java does noticeably better).

We don't know yet, because not enough software has been written in Go, but I'm confident Go will turn out to be a superb language for programming in the large. Time will tell.

===========

Now, to come back to the surprising question that opened my talk:

Why does Go, a language designed from the ground up for what what C++ is used for, not attract more C++ programmers?

Jokes aside, I think it's because Go and C++ are profoundly different philosophically.

C++ is about having it all there at your fingertips. I found this quote on a C++11 FAQ:
The range of abstractions that C++ can express elegantly, flexibly, and at zero costs compared to hand-crafted specialized code has greatly increased.
That way of thinking just isn't the way Go operates. Zero cost isn't a goal, at least not zero CPU cost. Go's claim is that minimizing programmer effort is a more important consideration.

Go isn't all-encompassing. You don't get everything built in. You don't have precise control of every nuance of execution. For instance, you don't have RAII. Instead you get a garbage collector. You don't even get a memory-freeing function.

What you're given is a set of powerful but easy to understand, easy to use building blocks from which you can assemble—compose—a solution to your problem. It might not end up quite as fast or as sophisticated or as ideologically motivated as the solution you'd write in some of those other languages, but it'll almost certainly be easier to write, easier to read, easier to understand, easier to maintain, and maybe safer.

To put it another way, oversimplifying of course:

Python and Ruby programmers come to Go because they don't have to surrender much expressiveness, but gain performance and get to play with concurrency.

C++ programmers don't come to Go because they have fought hard to gain exquisite control of their programming domain, and don't want to surrender any of it. To them, software isn't just about getting the job done, it's about doing it a certain way.

The issue, then, is that Go's success would contradict their world view.

And we should have realized that from the beginning. People who are excited about C++11's new features are not going to care about a language that has so much less.  Even if, in the end, it offers so much more.

Thank you.

by rob (noreply@blogger.com) at July 04, 2012 03:20 PM

July 02, 2012

Go's official blog

Go videos from Google I/O 2012

Phew! Google I/O is over for another year, and what an event it was. Thanks to our guest speakers and everyone who attended the four Go sessions. It was a lot of fun.

Here are the session videos:

Go concurrency patterns by Rob Pike

Concurrency is the key to designing high performance network services. Go's concurrency primitives (goroutines and channels) provide a simple and efficient means of expressing concurrent execution. In this talk we see how tricky concurrency problems can be solved gracefully with simple Go code.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="304" src="http://www.youtube.com/embed/f6kdp27TYZs" width="540"></iframe>

Go in production

Since Go's release in 2009 many companies (besides Google, of course) have used the language to build cool stuff. In this session Gustavo Niemeyer (Canonical), Keith Rarick (Heroku), Evan Shaw (Iron.io), and Patrick Crosby (StatHat) share their first-hand experience using Go in production environments.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="304" src="http://www.youtube.com/embed/kKQLhGZVN4A" width="540"></iframe>

Meet the Go team

A panel discussion with David Symonds, Robert Griesemer, Rob Pike, Ken Thompson, Andrew Gerrand, and Brad Fitzpatrick.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="304" src="http://www.youtube.com/embed/sln-gJaURzk" width="540"></iframe>

Computing Map Tiles with Go on App Engine by Chris Broadfoot and Andrew Gerrand

In this talk we use the Maps API and Go on App Engine to build an app to build custom tile sets for Google Maps. The app demonstrates using Go's suitability for computation in the cloud and App Engine's key scalability features, such as Task Queues and Backends.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="304" src="http://www.youtube.com/embed/sPagpg0b7no" width="540"></iframe>

by Andrew Gerrand (noreply@blogger.com) at July 02, 2012 09:43 PM

July 01, 2012

Miek Gieben

Libunbound wrapper in Go

I've created a small wrapper for libunbound for use in Go.

The code can be found at github. It depends on my Go DNS library which can be found here.

Official announcement on the Unbound-users@ list.

To give you a little taste of how it looks, I've (re)created tutorials 2 to 6 in Go. Tutorial 2 looks like this, for instance:

package main

// https://www.unbound.net/documentation/libunbound-tutorial-2.html

import (
        "dns"
        "fmt"
        "os"
        "unbound"
)

func main() {
        u := unbound.New()
        defer u.Destroy()

        if err := u.ResolvConf("/etc/resolv.conf"); err != nil {
                fmt.Printf("error %s\n", err.Error())
                os.Exit(1)
        }

        if err := u.Hosts("/etc/hosts"); err != nil {
                fmt.Printf("error %s\n", err.Error())
                os.Exit(1)
        }

        r, err := u.Resolve("www.nlnetlabs.nl.", dns.TypeA, dns.ClassINET)
        if err != nil {
                fmt.Printf("error %s\n", err.Error())
                os.Exit(1)
        }
        fmt.Printf("%+v\n", r)
}

by Miek Gieben at July 01, 2012 07:37 PM

June 29, 2012

RSC

_rsc: RT @maxtaco: Golang makes working in Unicode a pleasure. In a string-heavy 1kLOC project, upgraded from naive ascii to full unicode supp ...

_rsc: RT @maxtaco: Golang makes working in Unicode a pleasure. In a string-heavy 1kLOC project, upgraded from naive ascii to full unicode supp ...

June 29, 2012 04:32 PM

June 26, 2012

Stan Steel

Reading XML via Go

My 6 year old daughter and I play Minecraft together on occasion to pass the time.  We are slowly building our own little world,
just the two of us.  If you are not familiar with Minecraft, the game's central premise (the way we play) is surviving a world filled with monsters while starting with nothing.  Success usually entails finding shelter and a fire source before the sun sets on the first day.  After that, the excitement is pretty sparse.  You spend a lot of time mining for resources and erecting humble structures.  To survive is to win.  A couple weeks ago, we were playing together and I think we both came to the conclusion we were bored.  Usually, when this happens we start a new world together to get through the excitement of the first night.  This time, however, we decided it'd be cool to design a game together.  So, she spent the next hour drawing pictures of how the monsters, foliage, sky, and people should look.  I asked questions to keep her creativity directed toward the goal of game design.  After that, like most of our projects, days came and went and I thought all was forgotten.

Fast forward to today.  I found myself trying to understand how to use the Go XML package to unmarshall an XML file into a set of types that I can use.  This was spurred by my daughter's recent question of, "Are you still working on our game, Daddy?"  So, today I was trying to read a COLLADA (3D model file) and display it into an OpenGL window.  My conclusion is that it isn't difficult at all.  It was actually easy; so easy, in fact, I felt compelled to write this post afterward.  This is a summary of my experience after a couple hours of effort.  Keep in mind, a lot of the time was directed towards remembering OpenGL, looking for examples to follow, and figuring out the COLLADA specification.  Here we focus on reading the COLLADA file.

Understanding the COLLADA File
The COLLADA file specification is a pretty extensive XML schema whose full specification can be found here and the summary that actually helped me understand it is found here.  In this post, I am focusing on the data I need to get to; which are vertices, normals, and triangles.  Here is an example file with only those portions I cared about remaining and the tags of particular interest highlighted in yellow. 

<?xml version="1.0" encoding="utf-8"?>
<COLLADA xmlns="http://www.collada.org/2005/11/COLLADASchema" version="1.4.1">
  <asset>
    <contributor>
      <author>Blender User</author>
      <authoring_tool>Blender 2.58.0 r37702</authoring_tool>
    </contributor>
    <created>2011-07-26T02:34:24</created>
    <modified>2011-07-26T02:34:24</modified>
    <unit name="meter" meter="1"/>
    <up_axis>Z_UP</up_axis>
  </asset>
  <library_geometries>
    <geometry id="Plane-mesh">
      <mesh>
        <source id="Plane-mesh-positions">
          <float_array id="Plane-mesh-positions-array" count="12">1 0.3503274 0 1 -0.3503274 0 -1 -0.9999998 0 -0.9999997 1 0</float_array>
        </source>
        <source id="Plane-mesh-normals">
          <float_array id="Plane-mesh-normals-array" count="6">0 0 1 0 0 1</float_array>
        </source>
        <vertices id="Plane-mesh-vertices">
          <input semantic="POSITION" source="#Plane-mesh-positions"/>
        </vertices>
        <polylist count="2">
          <input semantic="VERTEX" source="#Plane-mesh-vertices" offset="0"/>
          <input semantic="NORMAL" source="#Plane-mesh-normals" offset="1"/>
          <vcount>3 3 </vcount>
          <p>0 0 3 0 1 0 3 1 2 1 1 1</p>
        </polylist>
      </mesh>
    </geometry>
  </library_geometries>
</COLLADA>

When writing the code to read this file, I made some assumptions about the COLLADA file's data because I was the originator of the file and had control.  First, I assumed there was no texture coordinates in the file, this changes how one would parse the <p>0 0 3 0 1 0 3 1 2 1 1 1</p> section from file above (this summary describes the alternate form here).  Second, I assumed the <polylist> is always describing triangles and not quads.  The above assumptions would change for production code and it is easy enough to get the data that it wouldn't be a problem to handle more than the single case I handle here.

Organizing and Annotating Data Types in Preparation of Unmarshalling the COLLADA File
Using Go's XML package to read the file data in couldn't be any easier.  You create struct types that represent the tags you care about and 'annotate' fields with metadata that tells the parser how to handle data and where to put it.  If you follow the rules properly (found in the XML package documentation), you should have no problem getting the XML package to load the data into your types for you.  In this case, there was some post processing required to get the <float_array> and <p> data into arrays of appropriate numeric types.  Here are the structures I used:

type Collada struct {
    Id                    string               `xml:"attr"` 
    Version               string               `xml:"attr"`
    Library_Geometries    LibraryGeometries
    Library_Visual_Scenes LibraryVisualScenes
}

type LibraryGeometries struct{
    XMLName  xml.Name   `xml:"library_geometries"`
    Geometry []Geometry
}

type Geometry struct{
    XMLName xml.Name  `xml:"geometry"`
    Id      string    `xml:"attr"`
    Mesh    Mesh
}

type Mesh struct {
    XMLName  xml.Name `xml:"mesh"`
    Source   []Source
    Polylist Polylist
}

type Source struct{
    XMLName     xml.Name   `xml:"source"`
    Id          string     `xml:"attr"`
    Float_array FloatArray `xml:"float_array"`
}

type FloatArray struct{
    XMLName xml.Name `xml:"float_array"`
    Id      string   `xml:"attr"`
    CDATA   string   `xml:"chardata"`
    Count   string   `xml:"attr"`
}

type Polylist struct{
    XMLName xml.Name  `xml:"polylist"`
    Id      string    `xml:"attr"`
    Count   string    `xml:"attr"`
   
    // List of integers, each specifying the number of vertices for one polygon
    VCount  string    `xml:"vcount"`
   
    // list of integers that specify the vertex attributes
    P       string    `xml:"p"` 
}

type LibraryVisualScenes struct {
    XMLName      xml.Name       `xml:"library_visual_scenes"`
    VisualScene  VisualScene
}

type VisualScene struct{
    XMLName      xml.Name       `xml:"visual_scene"`   
}

I highlighted sections above that were of interest.  The Id  string `xml:"attr"` section shows how to inform the XML parser to load an attribute value into a field.  The field name should be exported (first letter capitalized) and be tagged with the `xml:"attr"` metadata.  The XMLName  xml.Name `xml:"library_geometries"` section shows how you can identify that a structure is associated with a particular XML element.  The Source []Source  section represents an array that the XML parser will initialize and fill for you.  Finally, the CDATA string `xml:"chardata"` section depicts how you tell the parser where to put text.  In this case, it loaded it with the string of float numbers found in this element:
    
        <float_array id="Plane-mesh-positions-array" count="12">1 0.3503274 0 1 -0.3503274 0 -1 -0.9999998 0 -0.9999997 1 0</float_array>  

Unmarshalling the COLLADA File
Oh, I almost forgot.  Once you've created your data types, the call to fill your data simply looks like the following:

//
// Create meshes from COLLADA file
//
// There are some major limitations for this currently
// 1) Must only contain Triangles
// 2) No support for animation or materials at the moment
// 3) Will not read translations or rotations
//
func BuildModel(filename string) *data.Model{
    file, err := os.Open(filename)
    if err!= nil{
        fmt.Println(err.String())   
    }
   
    c := new(Collada)
    err = xml.Unmarshal(file, c)
    if err!= nil{
        fmt.Println(err.String())   
    }
    ...
}

In the above code, I've used a filename to open a file and I've constructed the root document element (the Collada type) on this line:

         c := new(Collada)

and called xml.Unmarshall passing the file and a reference to the Collada data type instance.  Everything we've properly modeled will be accessible from this root element.  Again, I had to do some post-processing to convert some textual numbers to float and int arrays, but in the end, an ugly model I built in Blender like this:


 was being shown in a Go COLLADA Model Viewer I built like this:



Again, this was super easy and only took about 2-3 hours (I've been writing this post for about the same amount of time).  The next step will be to create and apply textures to the model and see how how to get it displayed in the model viewer.

by Stan Steel (steel@kryas.com) at June 26, 2012 05:08 AM

June 25, 2012

embrace change

Using Sublime Text 2 with GoSublime


In Convinced by Sublime Text I already wrote about Sublime Text 2, my preferred editor I'm using under OS X and Linux. It's already very powerful out of the box. But together with the Package Control plugin and the large number of available packages it gets even better. Searching, installing, updating, removing, everything works like a charm.

Today I want to show you the GoSublime package by DisposaBoy. It provides

  • the well-known code completion of gocode by nsf,
  • own snippets for code-completion which intelligent - by type-comparison - try to use variables of the surrounding context,
  • simple adding and removing of imports,
  • direct jumps to imports, e.g. for defining an alias,
  • live error detection and highlighting using gotype; a list can be get with Cmd-. Cmd-E,
  • a build system simply based on the Go tools; so not only a go build or go test are possible, also variants like go test -i or go test -test.cpu 8 -test.v,
  • error cycling using F4/Shift-F4 (beside clicking on the error in the output window),
  • simple jump between the errors in different files and back,
  • formatting of the source with gofmt automatically when saving or by command and
  • directly run/play the current file.

Most of those commands can be accessed via the Cmd-. Cmd-. shortcut. Some or directly part of the plugin and so developed in Python. But a lot is done in MarGo, also developed by DisposaBoy, but in Go. Both tools are not yet done but grow fast. Every few days the automatic update of Package Control shows new changes and the change log is very interesting. Soon there will be

  • a system wide jump to the definition of a function or variable,
  • a display of Go docs in the editor and
  • a share of code from inside the editor in play.golang.org (it's already working in the developer releases of Sublime Text 2).

 So if you're a Go developer looking for a powerful environment try this duo.

by Frank Müller (noreply@blogger.com) at June 25, 2012 10:14 AM

Adam Langley

Decrypting SSL packet dumps

We all love transport security but it can get in the way of a good tcpdump. Unencrypted protocols like HTTP, DNS etc can be picked apart for debugging but anything running over SSL can be impenetrable. Of course, that's an advantage too: the end-to-end principle is dead for any common, unencrypted protocol. But we want to have our cake and eat it.

Wireshark (a common tool for dissecting packet dumps) has long had the ability to decrypt some SSL connections given the private key of the server, but the private key isn't always something that you can get hold of, or want to spread around. MITM proxies (like Fiddler) can sit in the middle of a connection and produce plaintext, but they also alter the connection: SPDY, client-certificates etc won't work through them (at least not without special support).

So here's another option: if you get a dev channel release of Chrome and a trunk build of Wireshark you can run Chrome with the environment variable SSLKEYLOGFILE set to, say, /home/foo/keylog. Then, in Wireshark's preferences for SSL, you can tell it about that key log file. As Chrome makes SSL connections, it'll dump an identifier and the connection key to that file and Wireshark can read those and decrypt SSL connections.

The format of the key log file is described here. There's an older format just for RSA ciphersuites that I added when Wireshark decrypted purely based on RSA pre-master secrets. However, that doesn't work with ECDHE ciphersuites (amongst others) so the newer format can be used to decrypt any connection. (You need the trunk build of Wireshark to support the newer format.) Chrome currently writes records of both formats.

This can also be coupled with spdy-shark to dissect SPDY connections.

Since key log support is part of NSS, support will hopefully end up in Firefox in the future.

June 25, 2012 07:00 AM

June 21, 2012

Miek Gieben

Printing MX records with Go DNS

Now that the API seems to stabilize it is time to update these items.

We want to create a little program that prints out the MX records of domains, like so:

% mx miek.nl
miek.nl.        86400   IN      MX      10 elektron.atoom.net.

Or

% mx microsoft.com 
microsoft.com.  3600    IN      MX      10 mail.messaging.microsoft.com.

We are using my Go DNS package. First the normal header of a Go program, with a bunch of imports. We need the dns package:

package main

import (
    "dns"
    "os"
    "fmt"
)

Next we need to get the local nameserver to use:

config, _ := dns.ClientConfigFromFile("/etc/resolv.conf")

Then we create a dns.Client to perform the queries for us. In Go:

c := new(dns.Client)

We skip some error handling and assume a zone name is given. So we prepare our question. For that to work, we need:

  1. a new packet (dns.Msg);
  2. setting some header bits (dns.Msg.MsgHdr);
  3. define a question section;
  4. fill out the question section: os.Args[1] contains the zone name.

Which translates into:

m := new(dns.Msg)
m.SetQuestion(dns.Fqdn(os.Args[1], dns.TypeMX)
m.MsgHdr.RecursionDesired = true

Then we need to finally 'ask' the question. We do this by calling the Exchange() function.

r, err := c.Exchange(m, config.Servers[0]+":"+config.Port)

Check if we got something sane. The following code snippet prints the answer section of the received packet:

if r != nil {
        if r.Rcode != dns.RcodeSuccess {
                fmt.Printf(" *** invalid answer name %s after MX query for %s\n", os.Args[1], os.Args[1])
                os.Exit(1)
        }
        // Stuff must be in the answer section
        for _, a := range r.Answer {
                fmt.Printf("%v\n", a)
        }
} else {
        fmt.Printf("*** error: %s\n", err.String())
}

And we are done.

Full source

The full source of mx.go can be found over at github. Compiling works with go build.

by Miek Gieben at June 21, 2012 08:30 AM

research!rsc

A Tour of Go

Last week, I gave a talk about Go at the Boston Google Developers Group meeting. There were some problems with the recording, so I have rerecorded the talk as a screencast and posted it on YouTube.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="360" src="http://www.youtube.com/embed/ytEkHepK08c?rel=0" width="640"></iframe>

Here are the answers to questions asked at the end of the talk.

Q. How does Go work with debuggers?

To start, both Go toolchains include debugging information that gdb can read in the final binaries, so basic gdb functionality works on Go programs just as it does on C programs.

We’ve talked for a while about a custom Go debugger, but there isn’t one yet.

Many of the programs we want to debug are live, running programs. The net/http/pprof package provides debugging information like goroutine stacks, memory profiling, and cpu profiling in response to special HTTP requests.

Q. If a goroutine is stuck reading from a channel with no other references, does the goroutine get garbage collected?

No. From the garbage collection point of view, both sides of the channel are represented by the same pointer, so it can’t distinguish the receive and send sides. Even if we could detect this situation, we’ve found that it’s very useful to keep these goroutines around, because the program is probably heading for a deadlock. When a Go program deadlocks, it prints all its goroutine stacks and then exits. If we garbage collected the goroutines as they got stuck, the deadlock handler wouldn’t have anything useful to print except "your entire program has been garbage collected".

Q. Can a C++ program call into Go?

We wrote a tool called cgo so that Go programs can call into C, and we’ve implemented support for Go in SWIG, so that Go programs can call into C++. In those programs, the C or C++ can in turn call back into Go. But we don’t have support for a C or C++ program—one that starts execution in the C or C++ world instead of the Go world—to call into Go.

The hardest part of the cross-language calls is converting between the C calling convention and the Go calling convention, specifically with the regard to the implementation of segmented stacks. But that’s been done and works.

Making the assumption that these mixed-language binaries start in Go has simplified a number of parts of the implementation. I don’t anticipate any technical surprises involved in removing these assumptions. It’s just work.

Q. What are the areas that you specifically are trying to improve the language?

For the most part, I’m not trying to improve the language itself. Part of the effort in preparing Go 1 was to identify what we wanted to improve and do it. Many of the big changes were based on two or three years of experience writing Go programs, and they were changes we’d been putting off because we knew that they’d be disruptive. But now that Go 1 is out, we want to stop changing things and spend another few years using the language as it exists today. At this point we don’t have enough experience with Go 1 to know what really needs improvement.

My Go work is a small amount of fixing bugs in the libraries or in the compiler and a little bit more work trying to improve the performance of what’s already there.

Q. What about talking to databases and web services?

For databases, one of the packages we added in Go 1 is a standard database/sql package. That package defines a standard API for interacting with SQL databases, and then people can implement drivers that connect the API to specific database implementations like SQLite or MySQL or Postgres.

For web services, you’ve seen the support for JSON and XML encodings. Those are typically good enough for ad hoc REST services. I recently wrote a package for connecting to the SmugMug photo hosting API, and there’s one generic call that unmarshals the response into a struct of the appropriate type, using json.Unmarshal. I expect that XML-based web services like SOAP could be framed this way too, but I’m not aware of anyone who’s done that.

Inside Google, of course, we have plenty of services, but they’re based on protocol buffers, so of course there’s a good protocol buffer library for Go.

Q. What about generics? How far off are they?

People have asked us about generics from day 1. The answer has always been, and still is, that it’s something we’ve put a lot of thought into, but we haven’t yet found an approach that we think is a good fit for Go. We’ve talked to people who have been involved in the design of generics in other languages, and they’ve almost universally cautioned us not to rush into something unless we understand it very well and are comfortable with the implications. We don’t want to do something that we’ll be stuck with forever and regret.

Also, speaking for myself, I don’t miss generics when I write Go programs. What’s there, having built-in support for arrays, slices, and maps, seems to work very well.

Finally, we just made this promise about backwards compatibility with the release of Go 1. If we did add some form of generics, my guess is that some of the existing APIs would need to change, which can’t happen until Go 2, which I think is probably years away.

Q. What types of projects does Google use Go for?

Most of the things we use Go for I can’t talk about. One notable exception is that Go is an App Engine language, which we announced at I/O last year. Another is vtocc, a MySQL load balancer used to manage database lookups in YouTube’s core infrastructure.

Q. How does the Plan 9 toolchain differ from other compilers?

It’s a completely incompatible toolchain in every way. The main difference is that object files don’t contain machine code in the sense of having the actual instruction bytes that will be used in the final binary. Instead they contain a custom encoding of the assembly listings, and the linker is in charge of turning those into actual machine instructions. This means that the assembler, C compiler, and Go compiler don’t all duplicate this logic. The main change for Go is the support for segmented stacks.

I should add that we love the fact that we have two completely different compilers, because it keeps us honest about really implementing the spec.

Q. What are segmented stacks?

One of the problems in threaded C programs is deciding how big a stack each thread should have. If the stack is too small, then the thread might run out of stack and cause a crash or silent memory corruption, and if the stack is too big, then you’re wasting memory. In Go, each goroutine starts with a small stack, typically 4 kB, and then each function checks if it is about to run out of stack and if so allocates a new stack segment that gets recycled once it’s not needed anymore.

Gccgo supports segmented stacks, but it requires support added recently to the new GNU linker, gold, and that support is only implemented for x86-32 and x86-64.

Segmented stacks are something that lots of people have done before in experimental or research systems, but they have never made it into the C toolchains.

Q. What is the overhead of segmented stacks?

It’s a few instructions per function call. It’s been a long time since I tried to measure the precise overhead, but in most programs I expect it to be not more than 1-2%. There are definitely things we could do to try to reduce that, but it hasn’t been a concern.

Q. Do goroutine stacks adapt in size?

The initial stack allocated for a goroutine does not adapt. It’s always 4k right now. It has been other values in the past but always a constant. One of the things I’d like to do is to look at what the goroutine will be running and adjust the stack accordingly, but I haven’t.

Q. Are there any short-term plans for dynamic loading of modules?

No. I don’t think there are any technical surprises, but assuming that everything is statically linked simplified some of the implementation. Like with calling Go from C++ programs, I believe it’s just work.

Gccgo might be closer to support for this, but I don’t believe that it supports dynamic loading right now either.

Q. How much does the language spec say about reflection?

The spec is intentionally vague about reflection, but package reflect’s API is definitely part of the Go 1 definition. Any conforming implementation would need to implement that API. In fact, gc and gccgo do have different implementations of that package reflect API, but then the packages that use reflect like fmt and json can be shared.

Q. Do you have a release schedule?

We don’t have any fixed release schedule. We’re not keeping things secret, but we’re also not making commitments to specific timelines.

Go 1 was in progress publicly for months, and if you watched you could see the bug count go down and the release candidates announced, and so on.

Right now we’re trying to slow down. We want people to write things using Go, which means we need to make it a stable foundation to build on. Go 1.0.1, the first bug release, was released four weeks after Go 1, and Go 1.0.2 was seven weeks after Go 1.0.1.

Q. Where do you see Go in five years? What languages will it replace?

I hope that it will still be at golang.org, that the Go project will still be thriving and relevant. We built it to write the kinds of programs we’ve been writing in C++, Java, and Python, but we’re not trying to go head-to-head with those languages. Each of those has definite strengths that make them the right choice for certain situations. We think that there are plenty of situations, though, where Go is a better choice.

If Go doesn’t work out, and for some reason in five years we’re programming in something else, I hope the something else would have the features I talked about, specifically the Go way of doing interfaces and the Go way of handling concurrency.

If Go fails but some other language with those two features has taken over the programming landscape, if we can move the computing world to a language with those two features, then I’d be sad about Go but happy to have gotten to that situation.

Q. What are the limits to scalability with building a system with many goroutines?

The primary limit is the memory for the goroutines. Each goroutine starts with a 4kB stack and a little more per-goroutine data, so the overhead is between 4kB and 5kB. That means on this laptop I can easily run 100,000 goroutines, in 500 MB of memory, but a million goroutines is probably too much.

For a lot of simple goroutines, the 4 kB stack is probably more than necessary. If we worked on getting that down we might be able to handle even more goroutines. But remember that this is in contrast to C threads, where 64 kB is a tiny stack and 1-4MB is more common.

Q. How would you build a traditional barrier using channels?

It’s important to note that channels don’t attempt to be a concurrency Swiss army knife. Sometimes you do need other concepts, and the standard sync package has some helpers. I’d probably use a sync.WaitGroup.

If I had to use channels, I would do it like in the web crawler example, with a channel that all the goroutines write to, and a coordinator that knows how many responses it expects.

Q. What is an example of the kind of application you’re working on performance for? How will you beat C++?

I haven’t been focusing on specific applications. Go is still young enough that if you run some microbenchmarks you can usually find something to optimize. For example, I just sped up floating point computation by about 25% a few weeks ago. I’m also working on more sophisticated analyses for things like escape analysis and bounds check elimination, which address problems that are unique to Go, or at least not problems that C++ faces.

Our goal is definitely not to beat C++ on performance. The goal for Go is to be near C++ in terms of performance but at the same time be a much more productive environment and language, so that you’d rather program in Go.

Q. What are the security features of Go?

Go is a type-safe and memory-safe language. There are no dangling pointers, no pointer arithmetic, no use-after-free errors, and so on.

You can break the rules by importing package unsafe, which gives you a special type unsafe.Pointer. You can convert any pointer or integer to an unsafe.Pointer and back. That’s the escape hatch, which you need sometimes, like for extracting the bits of a float64 as a uint64. But putting it in its own package means that unsafe code is explicitly marked as unsafe. If your program breaks in a strange way, you know where to look.

Isolating this power also means that you can restrict it. On App Engine you can’t import package unsafe in the code you upload for your app.

I should point out that the current Go implementation does have data races, but they are not fundamental to the language. It would be possible to eliminate the races at some cost in efficiency, and for now we’ve decided not to do that. There are also tools such as Thread Sanitizer that help find these kinds of data races in Go programs.

Q. What language do you think Go is trying to displace?

I don’t think of Go that way. We were writing C++ code before we did Go, so we definitely wanted not to write C++ code anymore. But we’re not trying to displace all C++ code, or all Python code, or all Java code, except maybe in our own day-to-day work.

One of the surprises for me has been the variety of languages that new Go programmers used to use. When we launched, we were trying to explain Go to C++ programmers, but many of the programmers Go has attracted have come from more dynamic languages like Python or Ruby.

Q. How does Go make it possible to use multiple cores?

Go lets you tell the runtime how many operating system threads to use for executing goroutines, and then it muxes the goroutines onto those threads. So if you’ve written a program that has four or more goroutines executing simultaneously, you can tell the runtime to use four OS threads and then you’re running on four cores.

We’ve been pleasantly surprised by how easy people find it to write these kinds of programs. People who have not written parallel or concurrent programs before write concurrent Go programs using channels that can take advantage of multiple cores, and they enjoy the experience. That’s more than you can usually say for C threads. Joe Armstrong, one of the creators of Erlang, makes the point that thinking about concurrency in terms of communication might be more natural for people, since communication is something we’ve done for a long time. I agree.

Q. How does the muxing of goroutines work?

It’s not very smart. It’s the simplest thing that isn’t completely stupid: all the scheduling operations are O(1), and so on, but there’s a shared run queue that the various threads pull from. There’s no affinity between goroutines and threads, there’s no attempt to make sophisticated scheduling decisions, and there’s not even preemption.

The goroutine scheduler was the first thing I wrote when I started working on Go, even before I was working full time on it, so it’s just about four years old. It has served us surprisingly well, but we’ll probably want to replace it in the next year or so. We’ve been having some discussions recently about what we’d want to try in a new scheduler.

Q. Is there any plan to bootstrap Go in Go, to write the Go compiler in Go?

There’s no immediate plan. Go does ship with a Go program parser written in Go, so the first piece is already done, and there’s an experimental type checker in the works, but those are mainly for writing program analysis tools. I think that Go would be a great language to write a compiler in, but there’s no immediate plan. The current compiler, written in C, works well.

I’ve worked on bootstrapped languages in the past, and I found that bootstrapping is not necessarily a good fit for languages that are changing frequently. It reminded me of climbing a cliff and screwing hooks into the cliff once in a while to catch you if you fall. Once or twice I got into situations where I had identified a bug in the compiler, but then trying to write the code to fix the bug tickled the bug, so it couldn’t be compiled. And then you have to think hard about how to write the fix in a way that avoids the bug, or else go back through your version control history to find a way to replay history without introducing the bug. It’s not fun.

The fact that Go wasn’t written in itself also made it much easier to make significant language changes. Before the initial release we went through a handful of wholesale syntax upheavals, and I’m glad we didn’t have to worry about how we were going to rebootstrap the compiler or ensure some kind of backwards compatibility during those changes.

Finally, I hope you’ve read Ken Thompson’s Turing Award lecture, Reflections on Trusting Trust. When we were planning the initial open source release, we liked to joke that no one in their right mind would accept a bootstrapped compiler binary written by Ken.

Q. What does Go do to compile efficiently at scale?

This is something that we talked about a lot in early talks about Go. The main thing is that it cuts off transitive dependencies when compiling a single module. In most languages, if package A imports B, and package B imports C, then the compilation of A reads not just the compiled form of B but also the compiled form of C. In large systems, this gets out of hand quickly. For example, in C++ on my Mac, including <iostream> reads 25,326 lines from 131 files. (C and C++ headers aren't “compiled form,” but the problem is the same.) Go promises that each import reads a single compiled package file. If you need to know something about other packages to understand that package’s API, then the compiled file includes the extra information you need, but only that.

Of course, if you are building from scratch and package A imports B which imports C, then of course C has to be compiled first, and then B, and then A. The import point is that when you go to compile A, you don’t reload C’s object file. In a real program, the dependencies are usually not a chain like this. We might have A1, A2, A3, and so on all importing B. It’s a significant win if none of them need to reread C.

Q. How do you identify a good project for Go?

I think a good project for Go is one that you’re excited about writing in Go. Go really is a general purpose programming language, and except for the compiler work, it’s the only language I’ve written significant programs in for the past four years.

Most of the people I know who are using Go are using it for networked servers, where the concurrency features have something contribute, but it’s great for other contexts too. I’ve used it to write a simple mail reader, file system implementations to read old disks, and a variety of other unnetworked programs.

Q. What is the current and future IDE support for Go?

I’m not an IDE user in the modern sense, so really I don’t know. We think that it would be possible to write a really nice IDE specifically for Go, but it’s not something we’ve had time to explore. The Go distribution has a misc directory that contains basic Go support for common editors, and there is a Goclipse project to write an Eclipse-based IDE, but I don’t know much about those.

The development environment I use, acme, is great for writing Go code, but not because of any custom Go support.

If you have more questions, please consult these resources.

June 21, 2012 12:00 AM

June 17, 2012

embrace change

Release 2.0.0 of the Common Go Library


I'm happy to announce the Release 2.0.0 of the Tideland Common Go Library. Most important part is the rework of the cells package for event-driven applications. Thanks to the early adopters of this package and their very useful hints. I hope they all now benefit from the modifications and don't have too much problems with the needed API changes. To give all of you a better introduction in how cells is working and has to be used I added a scenario simulating a shop with ordering, its stock, manufacturing and delivering.

Here are the changes:

Asserts

  • Better messages for a direct usage inside of editors
  • Renamed 'Substring' and 'Implementor' asserts

Cells

  • Larger rework with improvements of reliability, performance and convenience
  • Cell adding now lazy using a factory
  • Additional multiple cell adding and subscribing
  • Emits inside a cell now via an own type instead of a channel
  • Integrated configuration

Config

  • New package
  • Configuration is key/value based
  • Backends are configuration providers
  • Simple map based provider in this release; more to come

Monitoring

  • Simplified output of execution times
  • No more total time and theoretical operations per second

You'll find the library at http://code.google.com/p/tcgl/.

by Frank Müller (noreply@blogger.com) at June 17, 2012 09:24 PM

June 08, 2012

Adam Langley

New TLS versions

TLS is the protocol behind most secure connections on the Internet and most TLS is TLS 1.0, despite that fact that the RFC for 1.0 was published in January 1999, over 13 years ago.

Since then there have a two newer versions of TLS: 1.1 (2006) and 1.2 (2008). TLS 1.1 added an explicit IV for CBC mode ciphers as a response to CBC weaknesses that eventually turned into the BEAST attack. TLS 1.2 changes the previous MD5/SHA1 combination hash to use SHA256 and introduces AEAD ciphers like AES-GCM.

However, neither of these versions saw any significant adoption for a long time because TLS's extension mechanism allowed 1.0 to adapt to new needs.

But things are starting to change:

  • Google servers now support up to TLS 1.2.
  • iOS 5 clients support up to TLS 1.2.
  • Chrome dev channel supports up to TLS 1.1.
  • Twitter, Facebook and Cloudflare appear to be deploying TLS 1.2 support, although the nature of large deployments means that this may vary during a gradual deployment.
  • Opera supports up to TLS 1.2, although I believe that 1.1 and 1.2 are disabled by default.

In the long run, getting to 1.2 is worthwhile. The MD5/SHA1 hash combination used previous versions was hoped to be more secure than either hash function alone, but [1] suggests that it's probably only as secure as SHA1. Also, the GCM cipher modes allow AES to be used without the problems (and space overhead) of CBC mode. GCM is hardware accelerated in recent Intel and AMD chips along with AES itself.

But there are always realities to contend with I'm afraid:

Firstly, there's the usual problem of buggy servers. TLS has a version negotiation mechanism, but some servers will fail if a client indicates that it supports the newer TLS versions. (Last year, Yngve Pettersen suggested that 2% of HTTPS servers failed if the client indicated TLS 1.1 and 3% for TLS 1.2.)

Because of this Chrome implements a fallback from TLS 1.1 to TLS 1.0 if the server sends a TLS error. (And we have a fallback from TLS 1.0 to SSL 3.0 if we get another TLS error on the second try.) This, sadly, means that supporting TLS 1.1 cannot bring any security benefits because an attacker can cause us to fallback. Thankfully, the major security benefit of TLS 1.1, the explicit CBC IVs, was retrofitted to previous versions in the form of 1/n-1 record splitting after the BEAST demonstration.

Since these fallbacks can be a security concern (especially the fallback to SSLv3, which eliminates ECDHE forward secrecy) I fear that it's necessary to add a second, redundant version negotiation mechanism to the protocol. It's an idea which has been floated before and I raised it again recently.

But buggy servers are something that we've known about for many years. Deploying new TLS versions has introduced a new problem: buggy networks.

Appallingly it appears that there are several types of network device that manage to break when confronted with new TLS versions. There are some that break any attempt to offer TLS 1.1 or 1.2, and some that break any connections that negotiate these versions. These failures, so far, manifest in the form of TCP resets, which isn't currently a trigger for Chrome to fallback. Although we may be forced to add it.

Chrome dev or iOS users suffering from the first type of device see all of their HTTPS connections fail. Users suffering the second type only see failures when connecting to sites that support TLS 1.1 or 1.2. (Which includes Google.). iOS leaves it up to the application to implement fallback if they wish and adding TLS 1.2 support to Google's servers has caused some problems because of these bad networks.

We're working to track down the vendors with issues at the moment and to make sure that updates are available, and that they inform their customers of this. I'm very interested in any cases where Chrome 21 suddenly caused all or some HTTPS connections to fail with ERR_CONNECTION_RESET. If you hit this, please let me know (agl at chromium dot org).

([1] Antoine Joux, Multicollisions in Iterated Hash Functions: Application to Cascaded Constructions, CRYPTO (Matthew K. Franklin, ed.), Lecture Notes in Computer Science, vol. 3152, Springer, 2004, pp. 306–316.)

June 08, 2012 07:00 AM

June 07, 2012

RSC

_rsc: @JetBlue Printing a boarding pass on a Mac never prints the bar code. Have seen this over and over for a year or two. Any fixes planned?

_rsc: @JetBlue Printing a boarding pass on a Mac never prints the bar code. Have seen this over and over for a year or two. Any fixes planned?

June 07, 2012 02:35 PM

June 06, 2012

codegrunt.co.uk

Weak Hackathon Done

After a weekend of hacking on my chess engine weak, I’ve taken it from handling only king and pawn moves to being able to handle moves for all pieces, and over the past couple of days I have also added castling. Major props to the absolutely amazing chess programming wiki which has been an invaluable source throughout.

Next, I need to add pawn promotion and catch up on technical debt - the code is a little messy, and there is a backlog of unit tests to improve.

After that there comes the hairier chess rules, and then, finally, we come on to the interesting stuff - getting the damn thing play well (it’s already beaten me, but that doesn’t say much :-)

June 06, 2012 11:00 PM

May 31, 2012

codegrunt.co.uk

Weak Hackathon

This long bank holiday weekend I am planning to hack on my chess engine, weak, and take it from its current rather incomplete state where it can play some king and pawn moves to a place where it can actually play a game of chess (albeit without rules like en passant, threefold repetition, and the like).

Once I am done I plan to write a blog post discussing its implementation and the realm of computer chess engines in general.

I am also currently working on a rather large Go blog post which is ballooning into a lot more work than expected, hence the succinctness of this post :-)

May 31, 2012 11:00 PM

Stan Steel

Latencies to Remember

L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 3,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from disk 20,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns

By Jeff Dean (http://research.google.com/people/jeff/) &
posted here: https://gist.github.com/2841832

by Stan Steel (steel@kryas.com) at May 31, 2012 03:08 PM

May 29, 2012

jra's thoughts

Zero Downtime upgrades of TCP servers in Go

A recent post on the golang-nuts mailing list mentioned that Nginx can upgrade on the fly without ever stopping listening to it’s listen socket. The trick is to unset close-on-exec on the listen socket, then fork/exec a new copy of the server (on the upgraded binary) with an argument to tell it to use the inherited file descriptor instead of calling socket() and listen(s).

I wanted to see if I could achieve the same thing with Go, and what changes would be necessary to the standard libraries to make this possible. I got it working, without changing the standard library, so I wanted to explain what I did here.

The code for this post is here.

There are several interesting things I’d like to point out in this program. Twice, I used the pattern of “implement an interface in order to intercept method calls”. This is an important pattern in Go that I don’t think is widely understood and documented.

When I started thinking about this job I knew one of my problems would be to dig down inside of http.(*Server).Serve in order to get it to stop calling Accept() when the old server should shutdown gracefully. The problem is that there’s no hooks in there; the only way out of the loop (“accept, start go routine to process, do it again”) is for Accept to return an error. But if you think of Accept as a system call, you might think, “I can’t get in there and inject an error”. But Accept() is not a system call: it’s a method in the interface net.Listener. Which means that if you make your own object which satisfies net.Listener, you can pass that in to http.(*Server).Serve and do what you want in Accept().

The first time I read about embedding types in structures I got lost and confused. And when I tried it, I had pointers in the mix and I had lots of unexplained nil pointer errors. This time, I read it again and it made sense. Type embedding is essential when you want to interpose one of the methods of an interface. It lets you inherit all the implementations of the underlying object except for the one that you redefine. Take a look at stoppableListener in upgradable.go. The net.Listener interface requires three methods including Accept, Close, and Addr. But I only defined one of those, Accept(). How is it that stoppableListener still implements net.Listener? Because the other two methods “show through” from where they were embedded in it. Only Accept() has a more precise definition. When I wrote Accept(), then I needed to figure out how to talk to the underlying object, in order to pass on the Accept() call. The trick here is to understand that embedding a type creates a new field in your structure with the unqualified name of the type. So I can refer to the net.Listener inside of stoppableListener sl as sl.Listener, and I can call the underlying Accept() as sl.Listener.Accept().

Next I started wondering how to handle the “stopped” error code from Serve(). Immediately exiting with os.Exit(0) isn’t right, because there can still be go routines servicing HTTP clients. We need some way to know when each client is done. Interposition to the rescue again, since we can wrap up the net.Conn returned by Accept() and use it to decrement a count of the number of currently running connections. This technique of interposing the net.Conn object could have other interesting uses. For example, by trapping the Read() or Write() calls, you could enforce rate limiting on the connection, without needing the protocol implementation to know anything about it. You could even do other zany things like implement opportunistic encryption, again without the protocol knowing it was happening.

Once I knew that I would be able to stop my server gracefully, I needed to figure out how to start the new one on the correct file descriptor. Rog Peppe pointed me to the net.FileListener object, which one can construct from an *os.File (which you can make from a raw file descriptor in an int using os.NewFile).

The final problem is that net always sets the close-on-exec flag on file descriptors for sockets it opens. So I needed to turn that off on the listen socket, so that the file descriptor would still be valid in the new process.  Unfortunately syscall.CloseOnExec does not take a boolean to tell it what you want (do close or don’t close). So instead, I pulled out the stuff from syscall that I needed and put it directly in upgradable.go. Not pretty, but nicer than hacking the standard library. (This is a nice thing about Go: there’s almost always a way to do what you want, even if you have to hack around just a bit.)

I tested that it works manually (using command-line GETs from another window). I also tested it under load using http_load. It is really cool to be able to set up a load test for 20 seconds and get 3937 fetches/sec, then do the test again, adding in a few “GET http://localhost:8000/upgrade” from another window and still getting 3880 fetches/sec, even as I replaced the running binary a few times during the load test!

by jra at May 29, 2012 12:09 PM

May 25, 2012

embrace change

Fun with interfaces and higher-order functions


Imagine you're developing a runtime environment which can be used by other developers as a container for components. I've done it in my cells package for event-driven applications. Here the user has to implement an interface called Behavior. It defines the four methods

  • Init(env *Environment, id string) error
  • ProcessEvent(e Event, emitter Eventemitter)
  • Recover(r interface{}, e Event)
  • Stop() error

Those behavior implementations can now be deployed to an Environment with user-defined id for later subscription/unsubscription or removing. But an id must not be used more than once. This has to be checked before the behavior instance will be created. Otherwise it could be that this type allocates memory or even opens files or network connections.

But how to do that? I simply created another type called BehaviorFactory which is simply a func() Behavior. So if the behavior implementation has a constructor function like

func myBehaviorFactory() Behavior {
    return &myBehavior{}
}

the deployment can be done with

myEnv.AddCell("myId", myBehaviorFactory)

Now AddCell() first checks if the id is free. Only in this case the instance is created by calling the factory function and then call Init() on the new created behavior.

But sometimes you may need to pass some configuration data to a behavior. How can we realize that? Here higher-order functions are helping. Instead of writing a simple factory function just write a factory returning function.

func newMyBehaviorFactory(fileName string) BehaviorFactory {
    return func() Behavior { return &myBehavior{fileName} }
}

Now the deployment can be done with

myEnv.AddCell("myId", newMyBehaviorFactory(fileName))

The call of newMyBehaviorFactory() now creates the factory function which then is used inside of AddCell().

But there is more fun with interfaces. The behaviors shall be able to control some runtime aspects. Those are the length of the event queue (aka channel) and if the instances shall be pooled, how large the pool shall be and if the instances of this pool are stateful or not. Many behaviors don't care, they simply implement the Behavior interface. But those who are interested implement the AsynchronousBehavior interface with the method

  • QueueLength() int

 and/or the PoolableBehavior interface with the method

  • PoolConfig() (poolSize int, stateful bool).

 Inside of AddCell() the type assertion is now used to check if the behavior implements those interfaces like

queueLength := 1
if ab, ok := behavior.(AsynchronousBehavior); ok {
    queueLength = ab.QueueLength()
}

This way the behaviors only have to implement those methods if needed, pretty simple. Go really is a wonderful pragmatic and flexible language.

by Frank Müller (noreply@blogger.com) at May 25, 2012 09:11 PM

May 17, 2012

codegrunt.co.uk

What's the Appeal of Programming?

A Quick Aside

Before I get stuck into this post, I want to quickly mention my last. It got to the front page of Hacker News and stayed there for around a day. I submitted it on a whim and did not expect it to get even the slightest bit of attention. Thank you everybody who upvoted it, I am amazed and humbled that people found it (at least somewhat) interesting.

I fully intend to investigate the issue further and to write a follow up. There were some great comments on the Hacker News thread and on the article itself - leads for further research.

I also want to make it clear that the items listed on my upcoming post are coming, however the posts on that list require a fair bit of work and I want to make sure they’re as good as I want them to be. In order to stick to my promised post/week I need to ‘buffer’ a number of meatier posts, meanwhile I am going to throw out ideas I’ve wanted to express for some time.

The subject of today’s post is one of these things I’ve wanted to write about for a while and by coincidence there was a recent HN furore regarding Jeff Atwood’s latest blog post - Please Don’t Learn to Code - which addresses related issues, so I was motivated to write this now. I will try to pull the threads together.

Why Do I Code?

Growing up I spent a lot of time playing around with BASIC on my ZX Spectrum, and more than anything got a massive kick out of seeing this stuff actually do something, actually working on my computer. It was so satisfying.

Ever since, it has been that kick of seeing something actually come together, actually doing something which has driven my interest in computing. It might sound petty, but this thrill has never quite left me, and I really do think it lies at the heart of the pleasure of the thing. To quote edw519:-

The best part is getting something working for the first time where nothing was there before. For me, this is so exciting that I still I do a “happy dance” every time.

One utterly delightful aspect of programming is how low the barriers are to building things. If you wanted to design and build a car, say, you’d need lots of money and lots of people and a funnelling process from idea to physical reality to achieve your dreams, with plenty of difficult compromises in between. You’d certainly find it incredibly challenging not to mention expensive to try this on your own, to the extent it would be rather mad to try it at all.

Compare this to programming - all you need is a computer - even an ultra-cheap laptop is sufficient, and that combined with the web and all the free material out there is enough to get you from any level to really rather proficient limited only by your attitude and hard work. I can think of nothing else quite like it.

Fred Brooks put it far more eloquently than I ever could:-

The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds castles in the air, from air, creating by exertion of the imagination. Yet the program construct, unlike the poet’s words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Solving abstract problems with few constraints while still actually having an impact on the real world is a large part of the appeal of programming. It’s important not to forget what a miracle this stuff is - we build things which ‘exist’ in what amounts to an imaginary world of electrons travelling around lumps of transmuted sand which somehow do things which people find useful, so useful in fact, that these ghosts in the machine have utterly revolutionised the world and continue to do so at an ever increasing rate.

Programming and Talent

It is my strong conviction that talent and intelligence are the smallest components of the make up of a good programmer. I do believe they make a difference, but are important to a lesser degree than many think.

You need to be able to think logically and be capable of keeping a certain degree of state in your mind, that’s the talent of it, and you need to be bright enough to explore a problem sufficiently to find a solution, which is where the intelligence lies. But these are only pre-requisites. If you have enough of both, then you are going to be able to program (it turns out that a surprising number of people cannot, though to what degree that is talent vs. attitude I dont know.)

Attitude is the really crucial component. The vast size of the subject matter and the vast complexity of the machine in front of you means hubris is the biggest error you can possibly make. You will make mistakes, do stupid things, and misunderstand what the computer is actually doing much of the time. The vital attitude to have in facing this reality is to remain humble. As Dijkstra put it:-

The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.

A programmer who lacks this humility can be absolutely nightmarish to deal with. Mistakes get brushed under the carpet, code remains an awful unmaintainable mess because ‘it works’ and that’s sufficient (that whole subject is likely to be an entire other ranty post), and things never improve because the first step in improving something is to admit that there’s something which needs improving. You can’t suck less every year without admitted you suck in the first place.

The other, slightly less important, but still critical component of being a good programmer is hard work. Things are always harder than they seem. Any non-trivial problem you’re dealing with will screw you over repeatedly before you solve it, and not until you’ve felt like giving up 3 times will it finally decide to behave. Oh, and the final 10% of the project will take 90% of the time. You just have to keep on going.

This latter point is in fact a lesson I’ve had to learn the hard way with personal projects. I have started several highly ambitious projects, only to hit problems early and give up (‘hey this is meant to be a hobby, this is no fun!’, etc.) - my chess engine project, weak, may be progressing slowly, but it is progressing and I refuse to give up on it.

Another skill which matters rather a lot is the ability to communicate. If you can’t communicate your great idea or the fatal technical flaw in a proposed plan, then no matter how able you are as a developer your voice just won’t be heard. Also, if you can’t communicate well it’s a lot easier to end up in realms of unpleasantness with others, which gets in the way of everything.

Storm in a Teacup

Back to the furore. I strongly believe that programming should be open for all, and new programmers welcomed with open arms. This thing is amazing and nobody has the right to tell anybody that they can’t choose to learn this craft and get that wonderful kick out of making computers actually do stuff. The fact that there has been a big movement to spread coding knowledge is great, the more the merrier I say.

Having said that, I think Jeff has hit on an inconvenient and slightly unpalatable truth which strongly links back to the two crucial characteristics of a good programmer. What we emphatically don’t need are new programmers who believe programming is incredibly easy yet wonder why their spaghetti monster runs so slowly, or managers who learn to write a very simple rails app and therefore get to treat you with less respect because it’s ‘all so easy’. Most of all, we don’t need people thinking developers are fungible commodities, because that simply ain’t so.

What’s needed is humility, commitment, and an understanding that it takes many years to become truly proficient at this craft.

Another uncomfortable truth is that we professional programmers often feel threatened by the idea that all these young whippersnappers might come in and make us look incredibly stupid and redundant. The nightmare scenario of an inexperienced guy coming in and doing in 1 week what took you 4 months is not a pleasant thought.

The solution is to not attach your identity1 too closely to your code. You are not your job. Personally I’ve found this a very hard lesson and it’s something I still struggle with2, but forming this separation is so clearly the answer that it’s worth the struggle.

Try to be the best programmer you can be. Focus on creating solutions to people’s problems because that’s ultimately what matters, but don’t forget that maintainability matters too and code quality informs a lot of the externally observable quality of your solution.

Have fun. Build stuff you’re proud of. And above all else, be nice to other people who are also trying to improve themselves. I know I fail at that often enough myself, and it can sometimes be incredibly difficult, but ultimately we are all idiots, so the best we can do is to try and help each other be a little less stupid.

Notes

1

I reference this post in order to give credit where it’s due. I want to make it clear that I actually strongly dislike the tone of Zed Shaw’s blog post, not to mention many other occasions where he’s, in my opinion, acted in rather a bullying fashion (e.g. this and of course this). He is a great programmer, and his books are equally brilliant, but his behaviour towards others is not.

As I commented on hacker news, my perception of that kind of bullying nastiness put me off working on anything public for a long while. I take responsibility for my lack of productivity, but that kind of attitude certainly didn’t help anything. I don’t think it helps anybody.

2

I don’t want to be too much of a pop psychologist, but I do wonder whether the stereotypical experience of feeling somewhat inadequate at school while finding a means of gaining a sense of value via programming/working with computers binds self worth to your work early on. Of course, this is by no means the experience of all programmers out there, but I do think it is the experience of a fair few, including myself.

May 17, 2012 11:00 PM

May 12, 2012

Going Along

走过这个城市


走在丝雨中,完全没有赶路的意思。一种安心,让我醉了。是陶醉在满足里。

走在从美术馆回来的路上,再次驻足在那家咖啡店的外面。已经要打烊了。女主人还在忙着。临街的座位,就是我带走满意的地方。

结帐时,我轻轻的问:“请问到底Forty Cafe里加了什么果实“?她答:”就只是这种高级的Espresso独特的香味。“。她朗朗的笑了一下。笑走了刚才满脸的倦意。是的,再用心做同一件事,久了也是无奈的疲态。但只需客人能略微的感动,就值回全心的付出了。

是一种什么样的感觉?那味道是直达舌底的,很熟悉又不能想起。一次次的抚起杯子,让那香气帮我回忆。慢慢一口一口的咽下,舌跟挥之不去的思绪。淡淡的甜,但带着浓郁。是了是了,是小时候留口水的烤地瓜的味道。

问好了去国美的路,目的是寻找这间咖啡店。刚好赶在饭口前,也因为没有预约,只能沿窗一排坐下。选好了生菜沙拉,清口。然后是一碗玉米粥配烤面包,适口。再下来主菜沙朗鹅肝。这时天已经完全黑了,店里也是满满的。但丝毫没有喧哗。大家都静静的交谈着,夹杂着轻碰玻璃杯的悦耳。忽然觉得腿有些麻了。原来是保持一个姿势太久了,太专注于沙朗的嫩和鹅肝的滑了。

”台中有些慢,还有些人情味在。“昨天回来时,计程车司机说。他以为我从台北过来,一路和我慢慢的聊着。讲着小时候还能搭公车,可现在捷运还不知何时建好,公车也越来越少。每台车都等在场里,等着附近的电话招呼。满街都是小摩托。那我那天从逢甲夜市出来,能在交通灯口拦下一辆空车,是幸运的了。

更幸运的是我能系上安全带。今天的计程车突然刹停。我的电脑包重重地甩到座位下。我也猛地从沉迷的书中抬起头来大喝怎么啦。司机大叔连声对不起,走下车,从路上拾起一个铁笼子,帮前面比卡的司机搬上去,却完全没有责怪的意思。这是个包容的城市,狭窄的路上忙乱的车流大家彼此迁就着。我也就不能再发作什么了。

但也不尽然,前天的路险可是让我Oh My God。同事开车送我,就惊见一辆红色小面包突然从拐角冲出来,闪过一个要横穿马路的摩托,擦着我们就跑远了。Oh My God。几乎就撞到那个万幸的骑士,几乎我就亲眼目击一桩惨祸。

今天的电视新闻,一辆花莲的旅游车,爬山路时换档不慎失去动力,载着13名南韩游客下滑翻覆到28米深的溪谷。万幸有树木拖住在8米处,又有一个管理员碰巧经过及时报警救援,得以保全每个人的生命。管理员的车载镜头录像了整个过程,使我们得以亲眼看到别人的不幸万幸,再庆幸。

又是一场不幸,雪山隧道猛烈燃烧一户人家化为碳与青烟,小孩无从辨认需DNA验证。为什么这两天台湾事故连连?其实这世界每时每刻都在重复着幸与不幸的故事,只是没有人强迫我们做我们旁观者。来了台湾才发现媒体上充斥的都是这些负面,或者是一些喧哗燥浮。不。这肯定不是全部。至少我亲眼亲身的是一个和善安逸的社会。

刚到台湾的第二天就去了日月潭。潭不大也不惊艳只是静静的绿水既不像日圆也没有月弯。和本岛游玩的人一样我租了一辆单车,就在和风煦日中远离了大团的游客。沿湖边一路骑行,轻松自在。偶尔停下看看,照个照片。偶尔猛骑一通,感受一下风的力度。不觉得3km的单车道到头要回头了,就不加思索的把车推上了山,准备来个33公里的环湖一圈。之前在游客中心问过要4个小时,也没注意到自己还穿着上班的牛仔T-shirt。等满身大汗气喘连连的爬到山顶,才后悔没换上短装。只好卷起裤腿坚持下去。还好下去的一路飞驰的痛快让我忘了刚刚的辛苦无奈。这样一路滑到12Km处,又是一段长长的上坡山路。37速的捷安特山地车也骑不动了,只能慢慢推行。沿路对面有些飞行而过的快乐骑士,还有个老太太冲我喊加油。感觉这到10Km处的两千米最为吃力。还好又是一路下坡直到玄奘寺渡口。当然想都没想做船横渡回去,还剩10公里已经想着老太太给的鼓励又上路了。这段上坡好辛苦,又要躲着大小汽车,又要担心峭壁的落石。还好带够了水一路不停地喝,又吃着飞机上留下来的杂果,熬过了午餐时段的饥饿,一路上上下下地到了尹达邵渡口。吃了山猪肉串和皮串,买了猫头鹰纪念,看看表已经骑了3个小时了,这剩下6公里的山路就坚持吧就坚持到了终点。车店的老板娘夸我厉害我只能苦笑着说很好玩然后等着晚上精疲力竭地和老婆Skype时再被夸赞损我go nuts。

-- fango

by Fango (noreply@blogger.com) at May 12, 2012 11:53 AM

浪费半天在台北

6:40起床吃已经吃了七天腻到不行的饭店早餐。想到中午吃好点儿吧就几乎饿着check out搭计程车240元到高铁,用机器买8点的票到台北700元。一路翻译'银河搭车手册'转眼到站好像是9点。出站就直奔地图搞清楚捷运出口却发现搭669号公车也到101还能看街景就花了好长时间才找到Y8出口又等了更长时间才发现这号车40分钟才一趟直站到两腿发麻找个石凳坐下就这时来车了。已是10点了不过车票超级便宜15块就一路摇晃着看到了101就急忙下车了。正碰到new ipad开卖十几个模特摆pose就跟着媒体疯照了一通。信步走入101直奔5楼的观景售票点450元,但回身看到长长喧哗的观光队伍就立马没来兴致。发现旁边有寄存箱自设密码不要钱也没人用,就把沉重的电脑包扔进去一身轻松的window shopping lv channel等等国辱太太的最爱。没啥看的就下楼吃鼎泰封小笼包和最贵的450块5个的松露包饱餐之后就搭免费公车到捷运站。心想着抓紧时间到高铁的诚品买本书就按机器35块到板桥站。车过台北火车站时我不安了一下可是因为正忙着打这些字就一路出来捷运直奔高铁售票机器,看表是12点半就买1:44的票130元才突然发现我没去台北站却到了下一站新北。罢了罢了走到哪算哪开始绝望的到处转希望也能找个书店。这里可是啥都没有双肩开始发沉就走到检票口的长椅子上坐下把这段在手机上写完。写到1:15为打发时间就谈谈今天联合报发表的韩寒的博客讲他喜欢台湾人感谢港台保存了中华文化欲言又止。我可能可以说得更好但我不学鲁迅了投匕首被人捧痛苦到自己抽到烟死。1:30到起身等车10分钟就到了桃园。 顺便到华行柜台办理登机。要有行李在此托运快过到机场办理。坐公车30元到机场我还在坐着记录下这些文字。2:25到3:35才登机就在一家餐厅门口坐下享用他们的无线网络。下次来台北怎么也要住宿一晚,这是个越夜越美的城市好像每个人都要街上逛逛买些小吃走走看看打发时间。这次只是过路就在手机是草草记录也是打发时间吧。胡言乱语就此EOF。

by Fango (noreply@blogger.com) at May 12, 2012 09:13 AM

May 07, 2012

codegrunt.co.uk

The Brittleness of Type Hierarchies

In this post I explore what I perceive to be a fundamental issue with object orientation via a type hierarchy.

Apologetic Preamble

I want to make it clear that this post is emphatically not an attack on C# (a language I admire1), nor is C# the only language to implement type hierarchies (java is a better-known language which takes this approach), rather it’s a criticism of a general trait of the kind of object orientation implemented2 by languages like C#, which due to my familiarity with it is the language I use in this post to explore my point.

The Problem

When you write a program using a C#-like language you typically start by modelling your problem as a hierarchical set of classes which represent your domain. You then procede to ‘flesh out’ and expand upon these types with code which actually implements your program.

After a while you inevitably find an exception to the rules of the system you have created - your types turn out not to match the problem in some way. Typically this occurs later after a lot of work has already been done which is now strongly tied to the structure of your type hierarchy, so you are left with a problem - do you hack around the incongruity, usually the quickest solution, or restructure your types to adapt to the new requirement?

If you choose the former your hierarchy not only fails represent the problem anymore, it actively misleads you about it. If you choose the latter, you end up spending a long time yak shaving - working on something which has absolutely nothing to do with the problem itself and is purely a product of the system in which you are writing your program - the very definition of accidental complexity.

A Concrete Example

To explore this problem more concretely, let’s take a look at an actual example.

Note that, whatever example I choose, it’s inevitable that it will not fully capture this problem the way architecting a big project under deadline pressure will. However, exploring actual code is of such value that I think it worth doing regardless.

That said, let’s consider a type hierarchy which represents trades made on financial exchanges:-

class Exchange {
    public string Bic { get; set; }
    public string Name { get; set; }
}

abstract class Security {
    public string Description { get; set; }
    public Exchange Exchange { get; set; }
    public string Isin { get; set; }
}

class Stock : Security {
}

class Bond : Security {
    public DateTime Expiry { get; set; }
}

class Trade {
    public decimal Price { get; set; }
    public decimal Quantity { get; set; }
    public Security Security { get; set; }
}

So far, so good - this seems like a sensible approach and so we go ahead and write a lot of code around this hierarchy.

The code gradually becomes deeply coupled to our type design - we expect all financial securities to possess an ISIN (an international identification number), a description and an exchange which itself possesses a name and a BIC (bank identification code), we expect stocks to possess no extra properties, bonds to possess expiry dates, and trades to possess a security, price and quantity. The client is happy, the system works well and your code is nice and clean.

Suddenly the client demands a change - they are about to trade options and need to bring them into the system. Fine you think, so you come up with a class:-

class Option : Security {
    public bool Call { get; set; }
    public decimal LotSize { get; set; }
    public DateTime Maturity { get; set; }
    public decimal StrikePrice { get; set; }
}

But a problem arises - options are typically not assigned ISINs, and now we have a property which is meaningless for options. We are faced with the dilemma - a lot of the code is now reliant on Isin, and NullReferenceExceptions are getting thrown all over the place because the field isn’t getting populated. Do we remove the Isin field from Security and create a separate classification between physical and derivative securities (or perhaps ISIN-possessing instruments and non-ISIN-possessing instruments) - one containing Isin and one not? Or do we find some other, less salubrious way around the problem?

The client is getting angry now - there’s a hold up, developers are arguing about what to do, and the trading systems needs to be in place by the next day or some P45s are going to be handed out. So it’s decided - quick-fix now, real fix later:-

// Lots of:-
if(security.Isin == null) {
    ...
}

// Not to mention:-
if(security is Option) {
    ...
} else {
    ...
}

// And maybe even some:-
switch(security.GetType().Name) {
    case "Option":
        ...
    case "Stock":
        ...
    ...
}

Our code is suddenly not looking so great. Our type hierarchy now falsely implies that all securities have ISINs, a particular issue for developers new to the project, we have hacks all over the place, maybe not enough (we’re fairly sure we got everything but there are always those nooks and crannies which slip through the test suite), we are forced to add new hacks as we go and confidence in our code is definitely eroded.

Suddenly a new requirement arises - it turns out securities need to refer to traded exchange which might be different for two otherwise identical securities, but our architecture has lumped exchange with securities. We are back to our dilemma again…

An Imperfect Example

Again, I emphasise that the above example is necessarily imperfect, as to practically fit an example of this problem into a blog post necessitates quite some trivialising. That being said, hopefully it makes the point a little more concrete.

An objection here might be that these change requests merely exposed design mistakes made early on in the project whose impact inevitably got magnified down the line as code was written on the assumption that these mistakes were in fact valid. The problem with that argument is to me the problem with the waterfall method for software development - it assumes you know a great deal up front, something which is very rarely the case.

The reality of software development is almost the opposite of that assumed by approaches which require large amounts of upfront structural foundations to be laid. Requirements and facts around your code tends to change constantly and often in utterly unexpected ways, so having a structure up front which lays down brittle laws about your system is setting yourself up to fail.

The brittleness of the laws is the real issue - if their failing didn’t result in hacks or yaks then there wouldn’t be as much of a problem.

Solutions

I hate to state a problem without suggesting solutions, I am after all a programmer, and our job is to solve problems, not just create them.

I am certainly not suggesting that object orientation is somehow fundamentally flawed, nor am I advocating writing code procedurally when it could benefit from the encapsulation, abstraction and de-duplication of an alternative approach purely out of FUD about potential future issues, rather I think that if you are going to write in a language which supports this approach it’s important to consider this issue before committing to representing your problem this way.

Some ideas:-

  • There are methods and patterns for separation of concerns and decoupling of related code, including techniques like dependency injection, which may help mitigate this problem to some degree while still using a C#-like language.

  • Functional programming is enjoying a great upswing in interest and popularity these days. I wonder whether the stronger type systems of these languages and the greater guarantees about what a function does/does not do (i.e. less likely to produce side-effects, immutable state) might provide a means to more quickly adapt a change to the model at compile- rather than run-time.

  • Bottom-up programming in a language like Lisp offers an enormous amount of flexibility in meta-programming and might make downstream changes easier to introduce later in the development process.

  • Dynamic programming languages like Python, Ruby and Lisp provide a far less rigid type structure - types don’t necessarily need to be known at compile time, meaning you have a great deal of flexibility to write functions which are less tied to a previously-determined structure than in a statically-typed language.

  • Go provides an intriguingly simplified version of static typing where no type hierarchy is permitted to be defined at all, rather you express relations between types through implicit interfaces which define types through what they do rather than what they are. They are implicit in that as soon as you’ve defined what you want, i.e. what methods a type possesses, then any type which possesses those methods implements the interface.

  • Smalltalk implements object orientation in quite a different fashion from C#-like languages, with emphasis on passing messages between objects rather than the structure of the classes themselves. This could help reduce the dependency on initial structural decisions.

  • (ECMA|Java)script allows you to define relations between objects based on object prototypes rather than static classes, you simply can’t declare static user-defined types at all.

I don’t want to advocate a particular solution above any other, as I am still undecided and exploring these ideas myself, not to mention the fact that a lot of this stuff sits decidedly in the grey area of rather subjective choices where it’s hard to definitely prove one approach is better than another.

I do however think that the brittleness of the upfront type hierarchy is a real issue which needs to be carefully considered before writing a large program and weighed up against other engineering approaches.

Notes

1

I have spent several years working with C#. Overall it’s a fantastic language which has grown ever more fantastic over time. Starting as, frankly, Microsoft Java, it has grown to incorporate generics, better support for functional style programming through a neat lambda syntax, libraries offering standard functional-style methods such as map, filter and fold (select, where and aggregate respectively in C# parlance), a declarative syntax for querying generic collections via LINQ as well as numerous other little features which make it a very nice language to work with. It’s well-engineered and surprisingly ambitious.

2

I say this, rather than just saying ‘I have a problem with object orientation’, as there are different interpretations as to how object orientation should be implemented, and a C#-ish implementation is just one approach - cf. smalltalk.

May 07, 2012 11:00 PM

May 02, 2012

golang.jp

翻訳更新のお知らせ

Goプログラミング言語仕様の翻訳を最新にアップデートしました。
細かい更新箇所は多いのですが、前回の翻訳からの変更点は以下です。

  • マップの要素を削除するdelete組み込み関数が追加された。
  • rune型(他言語のchar型のようなもの)が追加された。
  • init関数内でゴルーチンが起動できるようになった。
  • 標準関数が返却するエラー情報は、新たに追加されたerror組み込み型を使用するようになった。

 ※2012/5/9 タイプミスを修正しました。報告いただきありがとうございました。

by noboru at May 02, 2012 05:44 PM

April 25, 2012

codegrunt.co.uk

Upcoming

Okay, so I haven’t updated this blog in quite some time.

I don’t want to become one of those people whose blog consists of very few blog posts, mostly apologising for not blogging more, so I’ll keep this short and sweet and simply announce some posts I’m working on:-

  • Why Go? - A post discussing the merits of Go and what makes it different from other programming languages.

  • Series: Algorithms - A series of posts on algorithms of interest, with a focus on exploring how you go about actually implementing them in code.

  • Series: Go Standard Library - A series of highly indulgent posts exploring Go’s standard libraries and how they are actually implemented in practice, in the vein of Jeff Atwood’s and Scott Hanselman’s recent posts on reading other’s source code.

  • Series: Go Internals - A series of yet more indulgent posts exploring Go’s internal implementation details.

  • Series: Weak and Chess Engines - A series of posts on the implementation of Weak and chess engines in general.

  • General Programming/Personal Topics of Interest - Basically whatever I happen to be thinking about at a given point. I think about programming a lot, and I think it’s time to put it out there, whether it’s coherent or of interest to other geeks out there, or not :-)

I am committing to getting at least one blog post out a week. I intend to fill my blog with exactly the kind of stuff I love reading in other people’s blogs - lots of juicy, no messing about, unix-bearded, hardcore tech detail and geekery with pride.

Sorry Brogrammers, I’ve got nothing for you.

April 25, 2012 11:00 PM

April 24, 2012

golang.jp

翻訳更新のお知らせ

久しぶりの翻訳更新です。
Go言語のインストール
の翻訳を更新しました。

いままではGo言語をソースファイルから自分でコンパイルするしかなかったのですが、Version1からバイナリで配布されるようになりました。また、Windowsも正式にサポートされるようになりました。

 

 

by noboru at April 24, 2012 08:53 AM

April 22, 2012

Kyle Lemons

Rx: My prescription for your Go dependency headaches

XKCD: Online Package Tracking

Oddly appropriate, don't you think?


There has been a lot of discussion on the Go Nuts mailing list about how to manage versioning in the nascent Go package ecosystem. We muttered about it at first, and then muttered about it some more when goinstall came about, and there has been a pretty significant uptick in discussion since the go tool began to take shape and the Go 1 release date approached. In talking with other Gophers at the GoSF meet-up recently, there doesn’t seem to be anyone who really has a good solution.

TL;DR: kylelemons.net/go/rx

The Problem

Before I get too far, let me first summarize the problem.

When developing a Go application, you will most likely find yourself depending on another package, often written by another author. The ease of utilizing such third-party packages with the go tool makes this an even likelier scenario, and it is, in fact, encouraged. Inevitably, however, the author of some package on which you depend will make a change to his package; this could be anything from an innocuous bug fix to a large-scale API reorganization, and you are suddenly left with two choices: stick with the version you have (often by cloning it locally) or bite the bullet and update. This is complicated by the fact that you may both directly and indirectly depend on the same package, which means that both your project and your intermediate dependency need to agree on which of the above choices to take, and in a relatively timely manner.

There have been many proposals and complaints, both on- and offline, with respect to this problem. It’s not a problem that’s unique to Go, either; tools like Apache’s Maven, Ruby’s Bundler, etc all attempt to solve this problem to a greater or lesser degree. It is such a prevalent theme in development that a term, DLL Hell (and the more technically correct term dependency hell), has come into common use to describe it.

Strategies

The most obvious thing to do is to be paranoid about package maintainers, and thus copy your dependencies into your project. If this strategy is sufficient, I highly recommend checking out goven, which will streamline this process (it even rewrites the imports!) for you. I take a different tack because I am lazy and don’t want to have to maintain other people’s code. I also don’t think this strategy simplifies the process of pulling in new changes from upstream, because you still have to update them one at a time until/unless something breaks.

The next obvious thing is to specify somewhere what version you want to check out, in the source code, so that go get knows about it and can do the right thing. This essentially boils down to something like import "path/package/version" (though various proposals suggest using @rev or similar). This is certainly a solution, and I suspect we will see tools emerge that will download source and update it to the proper revisions as a go get alternative. I didn’t choose this solution because this requires rewriting import paths when you update code and it makes it difficult to ensure that there is only a single version of a library built into the same binary, which can cause problems (if there are more, the init() calls will run twice, for one thing). It also doesn’t help with pulling in changes: you still are taking a chance that you’ll break something (sometimes without realizing it) whenever you pull from upstream.

Another reasonable strategy is to version-control the entire (or at least the dependencies within) GOPATH(s). This has the advantage that multiple developers always check out the correct versions, and branches and merges work nicely. A very simple tool along these lines is being developed as gogo, which allows you to version control your dependencies and share them between developers. As long as your version control system doesn’t mind having other version control systems’ (or its own) metadata stored inside it, this will work. The downside of this is that you are storing a lot of redundant data in your vcs, and it still doesn’t address the issue of how to figure out when and if you can update what packages.

Enter `rx`

So, since my ancient pre-goinstall build tool has been obsoleted, I figured I’d try my hand at distilling a reasonable, achievable set of goals out of the sea of requirements and suggestions and turn them into a tool for people to use. If you didn’t guess this from the previous section, the biggest problem that I think I can solve is helping you figure out what dependencies you can update without breaking your world. This can probably work in addition to at least a few of the strategies listed above for a more complete versioning solution, depending on your particular needs. Here are my informal design ideas/goals/requirements/notes:

  1. It shouldn’t try to “solve” dependency hell. Making people’s lives easier is enough for now.
  2. It should leverage the existing go tool and GOPATH conventions as much as possible.
  3. It should be easy to see the versions of packages, and to change the active one.
  4. It should be intelligent about updating and notice when an update breaks something else.
  5. It should be able to save a “known good” set of versions for easy rollback and sharing.
  6. It should be fun to use, and should not get in the way of the developer.

In that vein, I have started work on rx, my prescription for your Go dependency version headaches. It’s starting to approach a few of the the requirements above already. To whet your appetite, here are a few examples of what it can do:

  • rx list will show you inter-repository dependencies
  • rx tags will show you the what tags are available in a repository
  • rx prescribe will update a repository and test its transitive dependents

Each command also has plenty of fun options to play with; rx tags has, for instance, options to only show tags that are up- or downgrades. The structure of the program is strongly reminiscent of the design of the go tool (and, in fact, uses it for a lot of backend logic), and so should be familiar for most Gophers and fit nicely into your existing workflows.

Installation is, of course, rather simple:
go get -u kylelemons.net/go/rx

Here’s a brief example of using rx:

$ rx --rescan list | grep rpc
/<gopath>/src/github.com/kylelemons/go-rpcgen: codec webrpc main main echoservice main main offload wire webrpc
$ rx tags go-rpcgen | egrep v\|HEAD
193746c88dfebdc5462382b93c1038a29496d9af v2.0.0
a6938fa6ec0fb6a63fefab2c462d3cd1102cc477 v1.2.0
bf28cdf3e683dd0919800f6916141c17aa93c36d HEAD
bf28cdf3e683dd0919800f6916141c17aa93c36d v1.1.0
f73c5c8ea85bdfbdc69e6aa24dd90b43c7265c67 v1.0.0
$ rx pre go-rpcgen v2.0.0
ok      github.com/kylelemons/go-rpcgen/codec   0.051s
ok      github.com/kylelemons/go-rpcgen/examples/echo   0.139s
ok      github.com/kylelemons/go-rpcgen/examples/remote 0.019s
ok      github.com/kylelemons/blightbot/bot     0.029s
ok      github.com/kylelemons/go-paxos/paxos    0.053s
$ rx tags go-rpcgen | egrep v\|HEAD
193746c88dfebdc5462382b93c1038a29496d9af HEAD
193746c88dfebdc5462382b93c1038a29496d9af v2.0.0
a6938fa6ec0fb6a63fefab2c462d3cd1102cc477 v1.2.0
bf28cdf3e683dd0919800f6916141c17aa93c36d v1.1.0
f73c5c8ea85bdfbdc69e6aa24dd90b43c7265c67 v1.0.0
</gopath>

There’s not a whole lot here, but you can see that the list command (in its short form) found the repository and listed the (short) names of the packages that exist under it. The --rescan option told it to actually scan my repositories, instead of using the cached dependency graph. The tags command then showed me the interesting tags in the repository (it’s git, so HEAD also shows where it was currently), and then the prescribe command updated it to the latest tag. Notice that the repository’s tests were run, as well as tests for packages that depended on packages in that repository (transitively). They were also built and installed (except binaries, by default), though this isn’t displayed unless you use the -v option.

Expected Use Cases

To help elucidate the problem I’m trying to solve, here are a few use cases that I’d like to support.

Hobbyist Developer

As a single developer, you’ve probably got a single GOPATH into which all of your dependencies are installed alongside your own projects. You freely import between them, and everything generally works. You don’t run go get very often to pull down remote packages, unless you find a bug that has been fixed or you find a new feature in a newer library.

  • The rx fetch command will let you fetch the latest changesets without actually applying them.
  • The rx tags --up command will show you what tags you can upgrade to.
  • The rx prescribe command will allow you to update to a new tag.
  • The rx prescribe command automatically builds and tests depenants transitively.
  • The rx prescribe command will roll back the update if it turns out to have broken something.

Small Team

As a small team working on a Go project, your concerns are much different from that of a single developer. You want your team members to easily stay in sync with one another, and you will only rarely pull changes in from upstream once you have your project working with a particular dependency.

  • The --rxdir flag and RX_DIR environment variable let you version or share an rx configuration.
  • The rx cabinet --save command saves the versions of all repositories.
  • The rx cabinet --load command reverts/upgrades repositories to their saved state.
  • The rx cabinet --export command saves a relocatable cabinet that can be sshared.
  • The rx pin command lets you configure what repositories are considered for upgrade.
  • The rx auto command will try to upgrade packages automatically, keeping seamless upgrades.

The common theme among these commands is maintaining a cohesive group of dependency versions. When you update a dependency (which we’ve seen that rx prescribe can do automatically), you can save that as a “known good” configuration that you can share, save, and (if things go south) restore later. For packages that are known to misbehave or for the package you’re editing, the rx pin command allows you to specify manually what behavior they should have (never upgrade, always tip, never change, etc). To help with exploring what updates might apply seamlessly, the rx auto command will do the heavy lifting of figuring out which repositories depend on each other and will successively try updates.

Large Project

On a large project, you care about most of the same things as a small team, but there is also a good chance that you are working on multiple versions of your software simultaneously. There is also a good chance that any given developer may have multiple projects on his workstation which are independently versioned.

  • The rx cabinet --exclude command (and friends) configure exactly what cabinets track.
  • The rx cabinet --diff command shows differences in dependencies between cabinets.
  • The advanced rx prescribe optiosn can manage package upgrades auto can’t handle.

The theme here is that the same commands that worked in a small and medium environment continue to work, but that their concepts can be extended (and modified slightly) to accomodate the needs of a larger development team. The larger the team is, the more chances are that there will be multiple branches in play, and rx will need to understand this.

The Catch

There are still problems with this approach. As long as you start with a working project, you should generally be able to keep it working. You may not be able to ever update a package if one of its dependents never comes into line, though, which leads me to the biggest problem with this approach: it doesn’t make it easy to simply install a remote repository that has external dependencies. It’s intended primarily to support development and releasing of e.g. a binary, where your local development environment doesn’t matter to the end user. I’d like for there to be a nice way to import a package’s cabinet file when you’re importing it (so that your version of rx learns about what versions do and don’t work with various dependency versions), but I haven’t fully mapped this out.

Another problem which remains currently unsolved is the requirement to manually update when a dependency’s API changes. It would be nice to have some way for the author of a package to provide a way for dependent packages to fix themselves automatically; a tool like gofix. If this convention were widespread enough, it could vastly simplify the process of updating packages. This is something else about which I am thinking, and I hope that there are good libraries for easily making gofix-like tools in the future as well as a convention for including them in your projects.

Coming Soon

There is a lot of work to do, but I think it’s at the point where the best feedback is feedback from real users who have a real need for a tool like this. The next priorities on my list are:

  1. Save and restore global repository state
  2. Intelligently run “upgrade” experiments to find what new tags can be seamlessly integrated
  3. Support branches and branch switching
  4. Clean up and document more of the code

Your feedback, constructive criticism, and pull requests are all greatly appreciated!

P.S. I’m slowly cleaning up my many side-projects and making sure they work with Go 1. I’ll be listing them on kylelemons.net/go as I do, so feel free to e-mail me or find me on IRC if you have a favorite package that you want updated.

by Kyle Lemons at April 22, 2012 03:23 AM

April 21, 2012

embrace change

Nice technical task

A few days ago I've been mentioned in Google+ by Tyler Tallman who is using my Tideland Common Go Library and here especially the Cells package. It is a framework for event-driven applications with networked cells which run a user-implemented behavior. He is using it for the rule-based evaluation of data streams with medical data, indeed a good use-case. And he already found something to optimize. I'm using maps for the subscription of cells to other cells. That has been ok for me so far, but not for the performance needs of Tyler. So he optimized it by using a tree here.

As a result I will also allow different implementations here later. A map like today, but also a B-tree. I've got an old implementation I've done several years ago in Java and now I port it to Go. But not as an exact copy. First steps already showed that the code is by far more compact. But I also want to use concurrency by implementing the nodes as goroutines (together with their data structure). The idea is to let complex stuff like balancing be done in the background as well as an optimal usage of resources, e.g. when iterating over the items.

by Frank Müller (noreply@blogger.com) at April 21, 2012 02:19 PM

April 19, 2012

Go's official blog

Error handling and Go

If you have written any Go code you have probably encountered the built-in error type. Go code uses error values to indicate an abnormal state. For example, the os.Open function returns a non-nil error value when it fails to open a file.

func Open(name string) (file *File, err error)

The following code uses os.Open to open a file. If an error occurs it calls log.Fatal to print the error message and stop.

    f, err := os.Open("filename.ext")
if err != nil {
log.Fatal(err)
}
// do something with the open *File f

You can get a lot done in Go knowing just this about the errortype, but in this article we'll take a closer look at error and discuss some good practices for error handling in Go.

The error type

The error type is an interface type. An errorvariable represents any value that can describe itself as a string. Here is the interface's declaration:

type error interface {
Error() string
}

The error type, as with all built in types, is predeclared in the universe block.

The most commonly-used error implementation is the errors package's unexported errorString type.

// errorString is a trivial implementation of error.
type errorString struct {
s string
}

func (e *errorString) Error() string {
return e.s
}

You can construct one of these values with the errors.Newfunction. It takes a string that it converts to an errors.errorStringand returns as an error value.

// New returns an error that formats as the given text.
func New(text string) error {
return &errorString{text}
}

Here's how you might use errors.New:

func Sqrt(f float64) (float64, error) {
if f < 0 {
return 0, errors.New("math: square root of negative number")
}
// implementation
}

A caller passing a negative argument to Sqrt receives a non-nil error value (whose concrete representation is an errors.errorString value). The caller can access the error string ("math: square root of...") by calling the error's Error method, or by just printing it:

    f, err := Sqrt(-1)
if err != nil {
fmt.Println(err)
}

The fmt package formats an error value by calling its Error() string method.

It is the error implementation's responsibility to summarize the context. The error returned by os.Open formats as "open /etc/passwd: permission denied," not just "permission denied." The error returned by our Sqrt is missing information about the invalid argument.

To add that information, a useful function is the fmt package's Errorf. It formats a string according to Printf's rules and returns it as an error created by errors.New.

    if f < 0 {
return 0, fmt.Errorf("math: square root of negative number %g", f)
}

In many cases fmt.Errorf is good enough, but since error is an interface, you can use arbitrary data structures as error values, to allow callers to inspect the details of the error.

For instance, our hypothetical callers might want to recover the invalid argument passed to Sqrt. We can enable that by defining a new error implementation instead of using errors.errorString:

type NegativeSqrtError float64

func (f NegativeSqrtError) Error() string {
return fmt.Sprintf("math: square root of negative number %g", float64(f))
}

A sophisticated caller can then use a type assertion to check for a NegativeSqrtError and handle it specially, while callers that just pass the error to fmt.Println or log.Fatal will see no change in behavior.

As another example, the json package specifies a SyntaxError type that the json.Decode function returns when it encounters a syntax error parsing a JSON blob.

type SyntaxError struct {
msg string // description of error
Offset int64 // error occurred after reading Offset bytes
}

func (e *SyntaxError) Error() string { return e.msg }

The Offset field isn't even shown in the default formatting of the error, but callers can use it to add file and line information to their error messages:

    if err := dec.Decode(&val); err != nil {
if serr, ok := err.(*json.SyntaxError); ok {
line, col := findLine(f, serr.Offset)
return fmt.Errorf("%s:%d:%d: %v", f.Name(), line, col, err)
}
return err
}

(This is a slightly simplified version of some actual codefrom the Camlistore project.)

The error interface requires only a Error method; specific error implementations might have additional methods. For instance, the net package returns errors of type error, following the usual convention, but some of the error implementations have additional methods defined by the net.Errorinterface:

package net

type Error interface {
error
Timeout() bool // Is the error a timeout?
Temporary() bool // Is the error temporary?
}

Client code can test for a net.Error with a type assertion and then distinguish transient network errors from permanent ones. For instance, a web crawler might sleep and retry when it encounters a temporary error and give up otherwise.

        if nerr, ok := err.(net.Error); ok && nerr.Temporary() {
time.Sleep(1e9)
continue
}
if err != nil {
log.Fatal(err)
}

Simplifying repetitive error handling

In Go, error handling is important. The language's design and conventions encourage you to explicitly check for errors where they occur (as distinct from the convention in other languages of throwing exceptions and sometimes catching them). In some cases this makes Go code verbose, but fortunately there are some techniques you can use to minimize repetitive error handling.

Consider an App Engineapplication with an HTTP handler that retrieves a record from the datastore and formats it with a template.

func init() {
http.HandleFunc("/view", viewRecord)
}

func viewRecord(w http.ResponseWriter, r *http.Request) {
c := appengine.NewContext(r)
key := datastore.NewKey(c, "Record", r.FormValue("id"), 0, nil)
record := new(Record)
if err := datastore.Get(c, key, record); err != nil {
http.Error(w, err.Error(), 500)
return
}
if err := viewTemplate.Execute(w, record); err != nil {
http.Error(w, err.Error(), 500)
}
}

This function handles errors returned by the datastore.Getfunction and viewTemplate's Execute method. In both cases, it presents a simple error message to the user with the HTTP status code 500 ("Internal Server Error"). This looks like a manageable amount of code, but add some more HTTP handlers and you quickly end up with many copies of identical error handling code.

To reduce the repetition we can define our own HTTP appHandlertype that includes an error return value:

type appHandler func(http.ResponseWriter, *http.Request) error

Then we can change our viewRecord function to return errors:

func viewRecord(w http.ResponseWriter, r *http.Request) error {
c := appengine.NewContext(r)
key := datastore.NewKey(c, "Record", r.FormValue("id"), 0, nil)
record := new(Record)
if err := datastore.Get(c, key, record); err != nil {
return err
}
return viewTemplate.Execute(w, record)
}

This is simpler than the original version, but the http package doesn't understand functions that return error. To fix this we can implement the http.Handler interface's ServeHTTP method on appHandler:

func (fn appHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if err := fn(w, r); err != nil {
http.Error(w, err.Error(), 500)
}
}

The ServeHTTP method calls the appHandler function and displays the returned error (if any) to the user. Notice that the method's receiver, fn, is a function. (Go can do that!) The method invokes the function by calling the receiver in the expression fn(w, r).

Now when registering viewRecord with the http package we use the Handle function (instead of HandleFunc) as appHandler is an http.Handler (not an http.HandlerFunc).

func init() {
http.Handle("/view", appHandler(viewRecord))
}

With this basic error handling infrastructure in place, we can make it more user friendly. Rather than just displaying the error string, it would be better to give the user a simple error message with an appropriate HTTP status code, while logging the full error to the App Engine developer console for debugging purposes.

To do this we create an appError struct containing an error and some other fields:

type appError struct {
Error error
Message string
Code int
}

Next we modify the appHandler type to return *appError values:

type appHandler func(http.ResponseWriter, *http.Request) *appError

(It's usually a mistake to pass back the concrete type of an error rather than error, for reasons discussed in the Go FAQ, but it's the right thing to do here because ServeHTTP is the only place that sees the value and uses its contents.)

And make appHandler's ServeHTTP method display the appError's Message to the user with the correct HTTP status Code and log the full Error to the developer console:

func (fn appHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if e := fn(w, r); e != nil { // e is *appError, not os.Error.
c := appengine.NewContext(r)
c.Errorf("%v", e.Error)
http.Error(w, e.Message, e.Code)
}
}

Finally, we update viewRecord to the new function signature and have it return more context when it encounters an error:

func viewRecord(w http.ResponseWriter, r *http.Request) *appError {
c := appengine.NewContext(r)
key := datastore.NewKey(c, "Record", r.FormValue("id"), 0, nil)
record := new(Record)
if err := datastore.Get(c, key, record); err != nil {
return &appError{err, "Record not found", 404}
}
if err := viewTemplate.Execute(w, record); err != nil {
return &appError{err, "Can't display record", 500}
}
return nil
}

This version of viewRecord is the same length as the original, but now each of those lines has specific meaning and we are providing a friendlier user experience.

It doesn't end there; we can further improve the error handling in our application. Some ideas:

  • give the error handler a pretty HTML template,
  • make debugging easier by writing the stack trace to the HTTP response when the user is an administrator,
  • write a constructor function for appError that stores the stack trace for easier debugging,
  • recover from panics inside the appHandler, logging the error to the console as "Critical," while telling the user "a serious error has occurred." This is a nice touch to avoid exposing the user to inscrutable error messages caused by programming errors. See the Defer, Panic, and Recoverarticle for more details.

Conclusion

Proper error handling is an essential requirement of good software. By employing the techniques described in this post you should be able to write more reliable and succinct Go code.

by Andrew Gerrand (noreply@blogger.com) at April 19, 2012 01:50 AM

April 14, 2012

RSC

_rsc: RT @iamdanw: This is proper New Aesthetic http://t.co/DkTdf0od disassemble the algorithms and make things with them

_rsc: RT @iamdanw: This is proper New Aesthetic http://t.co/DkTdf0od disassemble the algorithms and make things with them

April 14, 2012 02:24 AM

April 13, 2012

embrace change

Concurrency as a natural paradigm


Since I'm working with Erlang/OTP and Go and point out their powerful handling of concurrency I've often been asked what's special with it and how it can be used. Additionally many people just mixup concurrency and parallelism. So to make it more clear I have to start by leaving computer — don't panic, I'll return later.

I'm wondering what you're doing right now. Yes, one thing is reading this text. But please spend some seconds looking around. What do you see, what is your environment? Maybe you're sitting at home or at work in your office. Or you're possibly sitting in a café reading this text on your smartphone. Are you alone while the rest of the world is frozen when you started to read this text? No, definitely not. There are people around you, some you're interacting with and some you don't care for. So let's assume you are at work. Especially in our software business we use to work in teams. A number of people, sometimes smaller, sometimes larger, works on a project with different roles and different tasks. Some of those tasks can be handled by one developer alone, but many need the discussions between the colleagues, they have to fit into the rest of the software and be realized in a useful order.

The work in those environments is done concurrently. From an artifacts perspective it may look serialized from the requirements until deployment. But the people responsible for all those artifacts of a project work in an independent but structured way like it's done in house building or factories since many hundreds of years. And that's no pure human behavior, we already know it from nature. Just take a look at bees and ants or all other animals building prides or states. In fact we all have grown with it but so far haven't been able to handle it in our software.

Now times have changed. We not only have languages like Go and Erlang/OTP (and many others more), we also have computers and soon smartphones with a large number of cores. I'm writing this text on a quad-core with 8 hyper-threads, and that's only the beginning. Additionally our software has to handle more and more stuff independently but in a structured way. Think about server applications handling a large number of user sessions where each session may send several overlapping requests, inside the server we're communicating with databases (maybe several due to a business driven mix of traditional relational and NoSQL backends), directory servers or further external systems in case of a SOA. Even more may we've have to take care of a global state as well as time-based or other events.

Today we handle those tasks request by request in a serialized manner. And while we're waiting for the answer of a relative slow I/O based task like reading from a database our request execution kicks one's heels. Concurrent software follows a different idea. While analyzing your problem and designing your solution you have to ask yourself where your dependencies are, what can be handled independently and where the access have to be synchronized. So you break up your code in independent components which you now run and compose with the help of your selected language. In Go those are goroutines communicating via channels, in Erlang/OTP processes sending messages into their mailboxes. Goroutines and processes are lightweight threads and a system can run hundreds of thousands.

Let's show it with a little scenario. You have to read data from a source, transform it into an internal representation for further processing, enrich it with data from an additional external source, modify data in two further external targets, at last transform the enriched data into the target format and write it to a target in order — no real uncommon job. Let's now do a first naive implementation (pseudo language):

// One serialized job execution.
fun Job(source, addSource, addTargetA, addTargetB, target) {
    raw = source.Read()
    internal = Modify(raw)
    enriched = addSource.Enrich(internal)
    modificationA = ExtractModificationA(enriched)
    modificationB = ExtractModificationB(enriched)
    addTargetA.Write(modificationA)
    addTargetB.Write(modificationB)
    result = Transform(enriched)
    target.Write(result)
}

Nice job, step by step. With a lot of waiting due to the I/O and maybe longer processing steps. So let's speed it up by spawning it several times.

// Starting our job a number of times.
fun Starter(source, addSource, addTargetA, addTargetB, target, count) {
    for i = 1 to count {
        spawn Job(source, addSource, addTargetA, addTargetB, target)
    }
}

Wow, now with full speed, Starter(…, 10) brings ten times the power! But eh, stop, wait a moment. They all read in parallel from our source. Does it care for synchronization? And if so other jobs have to wait while one is reading. And the same trouble with the additional source, shit. And why do we have to wait for those two modification with their writings. We don't get any result, so we could go on to the transformation and the writing to the target. And here once again the synchronization problem. Additionally how do we handle the in-order writing and how do we know that all is done? Our starter returns immediately after all spawnings. (sigh)

So let's try a better approach. Not with brute-force, but more intelligent. Here we design our methods and functions so that they can read from channels or write to channels to pass data between them.

// Pipelined job execution.
fun Job(source, addSource, addTargetA, addTargetB, target) {
    channel raw
    channel enriched
    channel multicast
    channel transform
    channel transformed
    channel modificationA
    channel modifiedA
    channel modificationB
    channel modifiedB
    channel done


    spawn source.Read(raw)
    spawn addSource.Enrich(raw, multicast)
    spawn fun () {
        for e in multicast {
            send e to transform
            send e to modificationA
            send e to modificationB
        }
    }()
    spawn Transform(transform, transformed)
    spawn target.Write(transformed, done)
    spawn ExtractModificationA(modificationA, modifiedA)
    spawn addTargetA.Write(modifiedA)
    spawn ExtractModificationB(modificationB, modifiedB)
    spawn addTargetA.Write(modifiedB)


    receive done
}

Uuuh! Looks a bit more difficult in the beginning? That's mostly due to my funny pseudo language. (smile) You just have to visualize it. The process source.Read() reads continuously out of the source writes the data into the raw channel. There's no need for synchronization here. And while the next chunk of data is read the process addSource.Enrich() reads from raw and writes it result to multicast. This one and the following anonymous function as process are needed to read from multicast and write to transform, modificationA and modificationB. This allows to handle the transformation and both modifications in parallel, here once again passing the data from process to process using a channel. I assume in my little pseudo runtime that the channels are buffered and spawned processes continue their work if their spawning process ends — you see I'm mixing Go and Erlang/OTP here. (smile) So we come to the final trick. While both modifications paths are fire and forget the transformation has a channel named done. This is for signalization of the end of processing. After the last data is read, enriched, transformed and written the process target.Write() sends a signal via done and our job function can return.

We see, we win a wonderful way of little processes doing their work almost independently and interacting via communication. There are still possible problems, e.g. addSource.Enrich() may block due to an internal error or an I/O problem. So you may need a kind of error signalization. Or at least time-outs like Erlang/OTP has in its receive statement and Go easily create with a <-time.After() in a select statement. Like finding a proper design you have to take care for those kinds of errors. There's still no free lunch. But this — safety in concurrent applications — may be a topic in another blog entry.

by Frank Müller (noreply@blogger.com) at April 13, 2012 02:54 PM

April 12, 2012

RSC

_rsc: RT @valueof: @tomdale @fat @pamelafox I got into tabs because of #golang.

_rsc: RT @valueof: @tomdale @fat @pamelafox I got into tabs because of #golang.

April 12, 2012 10:07 PM

research!rsc

QArt Codes

<style type="text/css"> .matrix { font-family: sans-serif; font-size: 0.8em; } table.matrix { padding-left: 1em; padding-right: 1em; padding-top: 1em; padding-bottom: 1em; } .matrix td { padding-left: 0.3em; padding-right: 0.3em; border-left: 2px solid white; border-right: 2px solid white; text-align: center; color: #aaa; } .matrix td.gray { color: black; background-color: #ddd; } </style>

QR codes are 2-dimensional bar codes that encode arbitrary text strings. A common use of QR codes is to encode URLs so that people can scan a QR code (for example, on an advertising poster, building roof, volleyball bikini, belt buckle, or airplane banner) to load a web site on a cell phone instead of having to “type” in a URL.

QR codes are encoded using Reed-Solomon error-correcting codes, so that a QR scanner does not have to see every pixel correctly in order to decode the content. The error correction makes it possible to introduce a few errors (fewer than the maximum that the algorithm can fix) in order to make an image. For example, in 2008, Duncan Robertson took a QR code for “http://bbc.co.uk/programmes” (left) and introduced errors in the form of a BBC logo (right):

That's a neat trick and a pretty logo, but it's uninteresting from a technical standpoint. Although the BBC logo pixels look like QR code pixels, they are not contribuing to the QR code. The QR reader can't tell much difference between the BBC logo and the Union Jack. There's just a bunch of noise in the middle either way.

Since the BBC QR logo appeared, there have been many imitators. Most just slap an obviously out-of-place logo in the middle of the code. This Disney poster is notable for being more in the spirit of the BBC code.

There's a different way to put pictures in QR codes. Instead of scribbling on redundant pieces and relying on error correction to preserve the meaning, we can engineer the encoded values to create the picture in a code with no inherent errors, like these:

This post explains the math behind making codes like these, which I call QArt codes. I have published the Go programs that generated these codes at code.google.com/p/rsc and created a web site for creating these codes.

Background

For error correction, QR uses Reed-Solomon coding (like nearly everything else). For our purposes, Reed-Solomon coding has two important properties. First, it is what coding theorists call a systematic code: you can see the original message in the encoding. That is, the Reed-Solomon encoding of “hello” is “hello” followed by some error-correction bytes. Second, Reed-Solomon encoded messages can be XOR'ed: if we have two different Reed-Solomon encoded blocks b1 and b2 corresponding to messages m1 and m2, b1 ⊕ b2 is also a Reed-Solomon encoded block; it corresponds to the message m1 ⊕ m2. (Here, ⊕ means XOR.) If you are curious about why these two properties are true, see my earlier post, Finite Field Arithmetic and Reed-Solomon Coding.

QR Codes

A QR code has a distinctive frame that help both people and computers recognize them as QR codes. The details of the frame depend on the exact size of the code—bigger codes have room for more bits—but you know one when you see it: the outlined squares are the giveaway. Here are QR frames for a sampling of sizes:

The colored pixels are where the Reed-Solomon-encoded data bits go. Each code may have one or more Reed-Solomon blocks, depending on its size and the error correction level. The pictures show the bits from each block in a different color. The L encoding is the lowest amount of redundancy, about 20%. The other three encodings increase the redundancy, using 38%, 55%, and 65%.

(By the way, you can read the redundancy level from the top pixels in the two leftmost columns. If black=0 and white=1, then you can see that 00 is L, 01 is M, 10 is Q, and 11 is H. Thus, you can tell that the QR code on the T-shirt in this picture is encoded at the highest redundancy level, while this shirt uses the lowest level and therefore might take longer or be harder to scan.

As I mentioned above, the original message bits are included directly in the message's Reed-Solomon encoding. Thus, each bit in the original message corresponds to a pixel in the QR code. Those are the lighter pixels in the pictures above. The darker pixels are the error correction bits. The encoded bits are laid down in a vertical boustrophedon pattern in which each line is two columns wide, starting at the bottom right corner and ending on the left side:

We can easily work out where each message bit ends up in the QR code. By changing those bits of the message, we can change those pixels and draw a picture. There are, however, a few complications that make things interesting.

QR Masks

The first complication is that the encoded data is XOR'ed with an obfuscating mask to create the final code. There are eight masks:

An encoder is supposed to choose the mask that best hides any patterns in the data, to keep those patterns from being mistaken for framing boxes. In our encoder, however, we can choose a mask before choosing the data. This violates the spirit of the spec but still produces legitimate codes.

QR Data Encoding

The second complication is that we want the QR code's message to be intelligible. We could draw arbitrary pictures using arbitrary 8-bit data, but when scanned the codes would produce binary garbage. We need to limit ourselves to data that produces sensible messages. Luckily for us, QR codes allow messages to be written using a few different alphabets. One alphabet is 8-bit data, which would require binary garbage to draw a picture. Another is numeric data, in which every run of 10 bits defines 3 decimal digits. That limits our choice of pixels slightly: we must not generate a 10-bit run with a value above 999. That's not complete flexibility, but it's close: 9.96 bits of freedom out of 10. If, after encoding an image, we find that we've generated an invalid number, we pick one of the 5 most significant bits at random—all of them must be 1s to make an invalid number—hard wire that bit to zero, and start over.

Having only decimal messages would still not be very interesting: the message would be a very large number. Luckily for us (again), QR codes allow a single message to be composed from pieces using different encodings. The codes I have generated consist of an 8-bit-encoded URL ending in a # followed by a numeric-encoded number that draws the actual picture:

http://swtch.com/pjw/#123456789...

The leading URL is the first data encoded; it takes up the right side of the QR code. The error correction bits take up the left side.

When the phone scans the QR code, it sees a URL; loading it in a browser visits the base page and then looks for an internal anchor on the page with the given number. The browser won't find such an anchor, but it also won't complain.

The techniques so far let us draw codes like this one:

The second copy darkens the pixels that we have no control over: the error correction bits on the left and the URL prefix on the right. I appreciate the cyborg effect of Peter melting into the binary noise, but it would be nice to widen our canvas.

Gauss-Jordan Elimination

The third complication, then, is that we want to draw using more than just the slice of data pixels in the middle of the image. Luckily, we can.

I mentioned above that Reed-Solomon messages can be XOR'ed: if we have two different Reed-Solomon encoded blocks b1 and b2 corresponding to messages m1 and m2, b1 ⊕ b2 is also a Reed-Solomon encoded block; it corresponds to the message m1 ⊕ m2. (In the notation of the previous post, this happens because Reed-Solomon blocks correspond 1:1 with multiples of g(x). Since b1 and b2 are multiples of g(x), their sum is a multiple of g(x) too.) This property means that we can build up a valid Reed-Solomon block from other Reed-Solomon blocks. In particular, we can construct the sequence of blocks b0, b1, b2, ..., where bi is the block whose data bits are all zeros except for bit i and whose error correction bits are then set to correspond to a valid Reed-Solomon block. That set is a basis for the entire vector space of valid Reed-Solomon blocks. Here is the basis matrix for the space of blocks with 2 data bytes and 2 checksum bytes:

1111111111
111111111111
111111
111111
111111
111111
111111
111111
1111111111
1111
1111
1111
1111
1111
1111
1111

The missing entries are zeros. The gray columns highlight the pixels we have complete control over: there is only one row with a 1 for each of those pixels. Each time we want to change such a pixel, we can XOR our current data with its row to change that pixel, not change any of the other controlled pixels, and keep the error correction bits up to date.

So what, you say. We're still just twiddling data bits. The canvas is the same.

But wait, there's more! The basis we had above lets us change individual data pixels, but we can XOR rows together to create other basis matrices that trade data bits for error correction bits. No matter what, we're not going to increase our flexibility—the number of pixels we have direct control over cannot increase—but we can redistribute that flexibility throughout the image, at the same time smearing the uncooperative noise pixels evenly all over the canvas. This is the same procedure as Gauss-Jordan elimination, the way you turn a matrix into row-reduced echelon form.

This matrix shows the result of trying to assert control over alternating pixels (the gray columns):

1111111111
1111111111
111111
111111
1111
111111
111111
1111
1111111111
1111
1111
111111
11111111
1111
1111
1111

The matrix illustrates an important point about this trick: it's not completely general. The data bits are linearly independent, but there are dependencies between the error correction bits that mean we often can't have every pixel we ask for. In this example, the last four pixels we tried to get were unavailable: our manipulations of the rows to isolate the first four error correction bits zeroed out the last four that we wanted.

In practice, a good approach is to create a list of all the pixels in the Reed-Solomon block sorted by how useful it would be to be able to set that pixel. (Pixels from high-contrast regions of the image are less important than pixels from low-contrast regions.) Then, we can consider each pixel in turn, and if the basis matrix allows it, isolate that pixel. If not, no big deal, we move on to the next pixel.

Applying this insight, we can build wider but noisier pictures in our QR codes:

The pixels in Peter's forehead and on his right side have been sacrificed for the ability to draw the full width of the picture.

We can also choose the pixels we want to control at random, to make Peter peek out from behind a binary fog:

Rotations

One final trick. QR codes have no required orientation. The URL base pixels that we have no control over are on the right side in the canonical orientation, but we can rotate the QR code to move them to other edges.


Further Information

All the source code for this post, including the web server, is at code.google.com/p/rsc/source/browse/qr. If you liked this, you might also like Zip Files All The Way Down.

Acknowledgements

Alex Healy pointed out that valid Reed-Solomon encodings are closed under XOR, which is the key to spreading the picture into the error correction pixels. Peter Weinberger has been nothing but gracious about the overuse of his binary likeness. Thanks to both.

April 12, 2012 07:00 PM

April 11, 2012

Adam Langley

False Start's Failure

Eighteen months ago(ish), Chrome started using False Start. False Start reduces the average time for an SSL handshake by 30%.

Since the biggest problem with transport security is that most sites don't use it, anything that reduces the latency impact of HTTPS is important. Making things faster doesn't just make them faster, it also makes them cheaper and more prevalent. When HTTPS is faster, it'll be used in more places than it would otherwise be.

But, sadly, False Start will be disabled, except for sites doing NPN, in Chrome 20. NPN is a TLS extension that we use to negotiate SPDY, although you don't have to use it to negotiate SPDY, you can advertise http/1.1 if you wish.

False Start was known to cause problems with a very small number of servers and the initial announcement outlined the uncommon scheme that we used to deploy it: we scanned the public Internet and built up a list of problematic sites. That list was built into Chrome and we didn't use False Start for connections to those sites. Over time the list was randomly eroded away and I'd try to address any issues that came up. (Preemptively so in the case of large sites.)

It did work to some extent. Many sites that had problems were fixed and it's a deployment scheme that is worth considering in the future. But it didn't ultimately work well enough for False Start.

Initially we believed that False Start issues were deterministic so long as the TLS Finished and application data records were sent in the same TCP packet. We changed Chrome to do this in the hopes of making False Start issues deterministic. However, we later discovered some HTTPS servers that were still non-deterministically False Start intolerant. I hypothesise that the servers run two threads per connection: one for reading and one for writing. Although the TCP packet was received atomically, thread scheduling could mean that the read thread may or may not be scheduled before the write thread had updated the connection state in response to the Finished.

This non-determinism made False Start intolerance difficult to diagnose and reduced our confidence in the blacklist.

The `servers' with problems were nearly always SSL terminators. These hardware devices terminate SSL connections and proxy unencrypted data to backend HTTP servers. I believe that False Start intolerance is very simple to fix in the code and one vendor suggested that was the case. None the less, of the vendors who did issue an update, most failed to communicate that fact to their customers. (A pattern that has repeated with the BEAST fix.)

One, fairly major, SSL terminator vendor refused to update to fix their False Start intolerance despite problems that their customers were having. I don't believe that this was done in bad faith, but rather a case of something much more mundane along the lines of “the SSL guy left and nobody touches that code any more”. However, it did mean that there was no good answer for their customers who were experiencing problems.

Lastly, it was becoming increasingly clear that we had a bigger problem internationally. Foreign admins have problems finding information on the subject (which is mostly in English) and foreign users have problems reporting bugs because we can't read them. We do have excellent agents in countries who liaise locally but it was still a big issue, and we don't cover every country with them. I also suspect that the distribution of problematic SSL terminators is substantially larger in some countries and that the experience with the US and Europe caused us to underestimate the problem.

In aggregate this lead us to decide that False Start was causing more problems than it was worth. We will now limit it to sites that support the NPN extension. This unfortunately means that it'll be an arcane, unused optimisation for the most part: at least until SPDY takes over the world.

April 11, 2012 07:00 AM

April 10, 2012

research!rsc

Finite Field Arithmetic and Reed-Solomon Coding

Finite fields are a branch of algebra formally defined in the 1820s, but interest in the topic can be traced back to public sixteenth-century polynomial-solving contests. For the next few centuries, finite fields had little practical value, but all changed in the last fifty years. It turns out that they are useful for many applications in modern computing, such as encryption, data compression, and error correction.

In particular, Reed-Solomon codes are an error-correcting code based on finite fields and used everywhere today. One early significant use was in the Voyager spacecraft: the messages it still sends back today, from the edge of the solar system, are heavily Reed-Solomon encoded so that even if only a small fragment makes it back to Earth, we can still reconstruct the message. Reed-Solomon coding is also used on CDs to withstand scratches, in wireless communications to withstand transmission problems, in QR codes to withstand scanning errors or smudges, in disks to withstand loss of fragments of the media, in high-level storage systems like Google's GFS and BigTable to withstand data loss and also to reduce read latency (the read can complete without waiting for all the responses to arrive).

This post shows how to implement finite field arithmetic efficiently on a computer, and then how to use that to implement Reed-Solomon encoding.

What is a Finite Field?

One way mathematicians study numbers is to abstract away the numbers themselves and focus on the operations. (This is kind of an object-oriented approach to math.) A field is defined as a set F and operators + and · on elements of F that satisfy the following properties:

  1. (Closure) For all x, y in F, x+y and x·y are in F.
  2. (Associative) For all x, y, z in F, (x+y)+z = x+(y+z) and (x·y)·z = x·(y·z).
  3. (Commutative) For all x, y in F, x+y = y+x and x·y = y·x.
  4. (Distributive) For all x, y, z in F, x·(y+z) = (x·y)+(x·z).
  5. (Identity) There is some element we'll call 0 in F such that for all x in F, x+0 = x. Similarly, there is some element we'll call 1 in F such that for all x in F, x·1 = x.
  6. (Inverse) For all x in F, there is some element y in F such that x+y = 0. We write y = −x. Similarly, for all x in F except 0, there is some element y in F such that x·y = 1. We write y = 1/x.

You probably recognize those properties from high school algebra class: the most well-known example of a field is the real numbers, where + is addition and · is multiplication. Other examples are complex numbers and fractions.

A mathematician doesn't have to prove the same results over and over for the real numbers ℝ, the complex numbers ℂ, the fractions ℚ, and so on. Instead, she can prove that a particular result holds for all fields—by assuming only the above properties, called the field axioms. Then she can apply the result by substituting a specific instance like the real numbers for the general idea of a field, the same way that a programmer can implement just one vector(T) and then instantiate it as vector(int), vector(string) and so on.

The integers ℤ are not a field: they lack multiplicative inverses. For example, there is no number that you can multiply by 2 to get 1, no 1/2. Surprisingly, though, the integers modulo any prime p do form a field. For example, the integers modulo 5 are 0, 1, 2, 3, 4. 1+4 = 0 (mod 5), so we say that 4 = −1. Similarly, 2·3 = 1 (mod 5), so we say that 3 = 1/2. After we've proved that ℤ/p is in fact a field, all the results about fields can be applied to ℤ/p. This is very useful: it lets us apply our intuition about the very familiar real numbers to these less familiar numbers. This field is written ℤ/p to emphasize that we're dealing with what's left after subtracting out all the p's. That is, we're dealing with what's left if you assume that p = 0. When you make that assumption, you get math that wraps around at p. These fields are called finite fields because, in contrast to fields like the real numbers, they have a finite number of elements.

For a programmer, the most interesting finite field is ℤ/2, which contains just the integers 0, 1. Addition is the same as XOR, and multiplication is the same as AND. Note that ℤ/p is only a field when p is prime: arithmetic on uint8 variables corresponds to ℤ/256, but it is not a field: there is no 1/2.

What can you do with a field?

The only problem with fields is that there's not a ton you can do with just the field axioms. One thing you can do is build polynomials, which were the original motivation for the mathematicians who pioneered the use of fields in the early 1800s. If we introduce a symbolic variable x, then we can build polynomials whose coefficients are field values. We'll write F[x] to denote the polynomials over x using coefficients from F. For example, if we use the real numbers ℝ as our field, then the polynomials ℝ[x] include x2+1, x+2, and 3.14x2 − 2.72x + 1.41. Like integers, these polynomials can be added and multiplied, but not always divided—what is (x2+1)/(x+2)?—so they are not a field. However, remember how the integers are not a field but the integers modulo a prime are a field? The same happens here: polynomials are not a field but polynomials modulo some prime polynomial are.

What does “polynomials modulo some prime polynomial” mean anyway? A prime polynomial is one that cannot be factored, like x2+1 cannot be factored using real numbers. The field ℤ/5 is what you get by doing math under the assumption that 5 = 0; similarly, ℝ[x]/(x2+1) is what you get by doing math under the assumption that x2+1 = 0. Just as ℤ/5 math never deals with numbers as big as 5, ℝ[x]/(x2+1) math never deals with polynomials as big as x2: anything bigger can have some multiple of x2+1 subtracted out again. That is, the polynomials in ℝ[x]/(x2+1) are bounded in size: they have only x1 and x0 (constant) terms. To add polynomials, we just add the coefficients using the addition rule from the coefficient's field, independently, like a vector addition. To multiply polynomials, we have to do the multiplication and then subtract out any x2+1 we can. If we have (ax+b)·(cx+d), we can expand this to (a·c)x2 + (b·c+a·d)x + (b·d), and then subtract (a·c)(x2+1) = (a·c)x2 + (a·c), producing the final result: (b·c+a·d)x + (b·d−a·c). That might seem like a funny definition of multiplication, but it does in fact obey the field axioms. In fact, this particular field is more familiar than it looks: it is the complex numbers ℂ, but we've written x instead of the usual i. Assuming that x2+1 = 0 is, except for a renaming, the same as defining i2 = −1.

Doing all our math modulo a prime polynomial let us take the field of real numbers and produce a field whose elements are pairs of real numbers. We can apply the same trick to take a finite field like ℤ/p and product a field whose elements are fixed-length vectors of elements of ℤ/p. The original ℤ/p has p elements. If we construct (ℤ/p)[x]/f(x), where f(x) is a prime polynomial of degree n (f's maximum x exponent is n), the resulting field has pn elements: all the vectors made up of n elements from ℤ/p. Incredibly, the choice of prime polynomial doesn't matter very much: any two finite fields of size pn have identical structure, even if they give the individual elements different names. Because of this, it makes sense to refer to all the finite fields of size pn as one concept: GF(pn). The GF stands for Galois Field, in honor of Évariste Galois, who was the first to study these. The exact polynomial chosen to produce a particular GF(pn) is an implementation detail.

For a programmer, the most interesting finite fields constructed this way are GF(2n)—the polynomial extensions of ℤ/2—because the elements of GF(2n) are bit vectors of length n. As a concrete example, consider (ℤ/2)[x]/(x8+x4+x3+x+1) . The field has 28 elements: each can be represented by a single byte. The byte with binary digits b7b6b5b4b3b2b1b0 represents the polynomial b7·x7 + b6·x6 + b5·x5 + b4·x4 + b3·x3 + b2·x2 + b1·x1 + b0. To add polynomials, we add coefficients. Since the coefficients are from ℤ/2, adding coefficients means XOR'ing each bit position separately, which is something computer hardware can do easily. Multiplying the polynomials is more difficult, because standard multiplication hardware is based on adding, but we need a multiplication based on XOR'ing. Because the coefficient math wraps at 2, (x2+x)·(x+1) = x3+2x2+x = x3+x, while computer multiplication would choose 1102* 0112 = 6 * 3 = 18 = 100102. However, it turns out that we can implement this field multiplication with a simple lookup table. In a finite field, there is always at least one element α that can serve as a generator. All the other non-zero elements are powers of α: α, α2, α3, and so on. This α is not symbolic like x: it's a specific element. For example, in ℤ/5, we can use α=2: {α, α2, α3, α4} = {2, 4, 8, 16} = {2, 4, 3, 1}. In GF(2n) the math is more complex but still works. If we know the generator, then we can, by repeated multiplication, create a lookup table exp[i] = αⁱ and an inverse table log[αⁱ] = i. Multiplication is then just a few table lookups: assuming a and b are non-zero, a·b = exp[log[a]+log[b]]. (That's a normal integer +, to add the exponents, not an XOR.)

Why do we care?

The fact that GF(2n) can be implemented efficiently on a computer means that we can implement systems based on mathematical theorems without worrying about the usual overflow problems you get when modeling integers or real numbers. To be sure, GF(2n) behaves quite differently from the integers in many ways, but if all you need is the field axioms, it's good enough, and it eliminates any need to worry about overflow or arbitrary precision calculations. Because of the lookup table, GF(28) is by far the most common choice of field in a computer algorithm. For example, the Advanced Encryption Standard (AES, formerly Rijndael) is built around GF(28) arithmetic, as are nearly all implementations of Reed-Solomon coding.

Code

Let's begin by defining a Field type that will represent the specific instance of GF(28) defined by a given polynomial. The polynomial must be of degree 8, meaning that its binary representation has the 0x100 bit set and no higher bits set.

type Field struct {
    ...
}

Addition is just XOR, no matter what the polynomial is:

// Add returns the sum of x and y in the field.
func (f *Field) Add(x, y byte) byte {
    return x ^ y
}

Multiplication is where things get interesting. If you'd used binary (and Go) in grade school, you might have learned this algorithm for multiplying two numbers (this is not finite field arithmetic):

// Grade-school multiplication in binary: mul returns the product x×y.
func mul(x, y int) int {
    z := 0
    for x > 0 {
        if x&1 != 0 {
            z += y
        }
        x >>= 1
        y <<= 1
    }
    return z
}

The running total z accumulates the product of x and y. The first iteration of this loop adds y to z if the low bit (the 1s digit) of x is 1. The next iteration adds y*2 if the next bit (the 2s digit, now shifted down) of x is 1. The next iteration adds y*4 if the 4s digit is 1, and so on. Each iteration shifts x to the right to chop off the processed digit and shifts y to the left to multiply by two.

To adapt this to multiply in a finite field, we need to make two changes. First, addition is XOR, so we use ^= instead of += to add to z. Second, we need to make the multiply reduce modulo the polynomial. Assuming that the inputs have already been reduced, the only chance of exceeding the polynomial comes from the shift of y. After the shift, then, we can check to see if we've overflowed, and if so, subtract (XOR) out one copy of the polynomial. The finite field version, then, is:

// GF(256) mutiplication: mul returns the product x×y mod poly.
func mul(x, y, poly int) int {
    z := 0
    for x > 0 {
        if x&1 != 0 {
            z ^= y
        }
        x >>= 1
        y <<= 1
        if y&0x100 != 0 {
            y ^= poly
        }
    }
    return z
}

We might want to do a lot of multiplication, though, and this loop is too slow. There aren't that many inputs—only 28×28 of them—so one option is to build a 64kB lookup table. With some cleverness, we can build a smaller lookup table. In the NewField constructor, we can compute α0, α1, α2, ..., record the sequence in an exp array, and record the inverse in a log array. Then we can reduce multiplication to addition of logarithms, like a slide rule does.

// A Field represents an instance of GF(256) defined by a specific polynomial.
type Field struct {
    log [256]byte // log[0] is unused
    exp [510]byte
}

// NewField returns a new field corresponding to
// the given polynomial and generator.
func NewField(poly, α int) *Field {
    var f Field
    x := 1
    for i := 0; i < 255; i++ {
        f.exp[i] = byte(x)
        f.exp[i+255] = byte(x)
        f.log[x] = byte(i)
        x = mul(x, α, poly)
    }
    f.log[0] = 255
    return &f
}

The values of the exp function cycle with period 255 (not 256, because 0 is impossible): α255 = 1. The straightforward way to implement Exp, then, is to look up the entry given by the exponent modulo 255.

// Exp returns the base 2 exponential of e in the field.
// If e < 0, Exp returns 0.
func (f *Field) Exp(e int) byte {
    if e < 0 {
        return 0
    }
    return f.exp[e%255]
}

Log is an even simpler table lookup, because the input is only a byte:

// Log returns the base 2 logarithm of x in the field.
// If x == 0, Log returns -1.
func (f *Field) Log(x byte) int {
    if x == 0 {
        return -1
    }
    return int(f.log[x])
}

Mul is where things get interesting. The obvious implementation of Mul is exp[(log[x]+log[y])%255], but if we double the exp array, so that it is 510 elements long, we can drop the relatively expensive %255:

// Mul returns the product of x and y in the field.
func (f *Field) Mul(x, y byte) byte {
    if x == 0 || y == 0 {
        return 0
    }
    return f.exp[int(f.log[x])+int(f.log[y])]
}

Inv returns the multiplicative inverse, 1/x. We don't implement divide: instead of x/y, we can use x · 1/y.

// Inv returns the multiplicative inverse of x in the field.
// If x == 0, Inv returns 0.
func (f *Field) Inv(x byte) byte {
    if x == 0 {
        return 0
    }
    return f.exp[255-f.log[x]]
}

Reed-Solomon Coding

In 1960, Irving Reed and Gustave Solomon proposed a way to build an error-correcting code using GF(2n). The method interpreted the m message bits as coefficients of a polynomial f of degree m−1 over GF(2n) and then sent f(0), f(α), f(α2), f(α3), ..., f(1). Any m of these, if received correctly, suffice to reconstruct f, and then the message can be read off the coefficients. To find a correct set, Reed and Solomon's algorithm constructed the f corresponding to every possible subset of m received values and then chose the most common one in a majority vote. As long as no more than (2n−m)/2 values were corrupted in transit, the majority will agree on the correct value of f. This decoding algorithm is very expensive, too expensive for long messages. As a result, the Reed-Solomon approach sat unused for almost a decade. In 1969, however, Elwyn Berlekamp and James Massey proposed a variant with an efficient decoding algorithm. In the 1980s, Berlekamp and Lloyd Welch developed an even more efficient decoding algorithm that is the one typically used today. These decoding algorithms are based on systems of equations far too complex to explain here; in this post, we will only deal with encoding. (I can't keep the decoding algorithms straight in my head for more than an hour or two at a time, much less explain them in finite space.)

In Reed-Solomon encoding as it is practiced today, the choice of finite field F and generator α defines a generator polynomial g(x) = (x−1)(x−α)(x−α2)...(x−αn−m). To encode a message m, the message is taken as the top coefficients of a degree n polynomial f(x) = m0xn−1+m1xn−2+...+mmxn−m−1. Then that polynomial can be divided by g to produce the remainder polynomial r(x), the unique polynomial of degree less than n−m such that f(x) − r(x) is a multiple of g(x). Since r(x) is of degree less than n−m, subtracting r(x) does not affect any of the message coefficients, just the lower terms, so the polynomial f(x) − r(x) (= f(x) + r(x)) is taken as the encoded message. All encoded messages, then, are multiples of g(x). On the receiving end, the decoder does some magic to figure out the simplest changes needed to make the received polynomial a multiple of g(x) and then reads the message out of the top coefficients.

While decoding is difficult, encoding is easy: the first m bytes are the message itself, followed by the c bytes defining the remainder of m·xc/g(x). We can also check whether we received an error-free message by checking whether the concatenation defines a polynomial that is a multiple of g(x).

Code

The Reed-Solomon encoding problem is this: given a message m interpreted as a polynomial m(x), compute the error correction bytes, m(x)·xc mod g(x).

The grade-school division algorithm works well here. If we fill in p with m(x)·xc (m followed by c zero bytes), then we can replace p by the remainder by iteratively subtracting out multiples of the generator polynomial g.

for i := 0; i < len(m); i++ {
    k := f.Mul(p[i], f.Inv(gen[0]))  // k = pi / g0
    // p -= k·g
    for j, g := range gen {
        p[i+j] = f.Add(p[i+j], f.Mul(k, g))
    }
}

This implementation is correct but can be made more efficient. If you want to try, run:

go get code.google.com/p/rsc/gf256
go test code.google.com/p/rsc/gf256 -bench Blog

That benchmark measures the speed of the implementation in blog_test.go, which looks like the above. Optimize away, or follow along.

There's definitely room for improvement:

$ go test code.google.com/p/rsc/gf256 -bench ECC
PASS
BenchmarkBlogECC   500000   7031 ns/op   4.55 MB/s
BenchmarkECC      1000000   1332 ns/op  24.02 MB/s

To start, we can expand the definitions of Add and Mul. The Go compiler's inliner would do this for us; the win here is not the inlining but the simplifications it will enable us to make.

for i := 0; i < len(m); i++ {
    if p[i] == 0 {
        continue
    }
    k := f.exp[f.log[p[i]] + 255 - f.log[gen[0]]]  // k = pi / g0
    // p -= k·g
    for j, g := range gen {
        p[i+j] ^= f.exp[f.log[k] + f.log[g]]
    }
}

(The implementation handles p[i] == 0 specially because 0 has no log.)

The first thing to note is that we compute k but then use f.log[k] repeatedly. Computing the log will avoid that memory access, and it is cheaper: we just take out the f.exp[...] lookup on the line that computes k. This is safe because p[i] is non-zero, so k must be non-zero.

for i := 0; i < len(m); i++ {
    if p[i] == 0 {
        continue
    }
    lk := f.log[p[i]] + 255 - f.log[gen[0]]  // k = pi / g0
    // p -= k·g
    for j, g := range gen {
        p[i+j] ^= f.exp[lk + f.log[g]]
    }
}

Next, note that we repeatedly compute f.log[g]. Instead of doing that, we can iterate lgen—an array holding the logs of the coefficients—instead of gen. We'll have to handle zero somehow: let's say that the array has an entry set to 255 when the corresponding gen value is zero.

for i := 0; i < len(m); i++ {
    if p[i] == 0 {
        continue
    }
    lk := f.log[p[i]] + 255 - f.log[gen[0]]  // k = pi / g0
    // p -= k·g
    for j, lg := range lgen {
        if lg != 255 {
            p[i+j] ^= f.exp[lk + lg]
        }
    }
}

Next, we can notice that since the generator is defined as

g(x) = (x−1)(x−α)(x−α2)...(x−αn−m)

the first coefficient, g0, is always 1! That means we can simplify the k = pi / g calculation to just k = pi. Also, we can drop the first element of lgen and its subtraction, as long as we ignore the high bytes in the result (we know they're supposed to be zero anyway).

for i := 0; i < len(m); i++ {
    if p[i] == 0 {
        continue
    }
    lk := f.log[p[i]]
    // p -= k·g
    for j, lg := range lgen {
        if lg != 255 {
            p[i+1+j] ^= f.exp[lk + lg]
        }
    }
}

The inner loop, which is where we spend all our time, has two additions by loop-invariant constants: i+1+j and lk+lg. The i+1 and lk do not change on each iteration. We can avoid those additions by reslicing the arrays outside the loop:

for i := 0; i < len(m); i++ {
    if p[i] == 0 {
        continue
    }
    lk := f.log[p[i]]
    // p -= k·g
    q := p[i+1:]
    exp := f.exp[lk:]
    for j, lg := range lgen {
        if lg != 255 {
            q[j] ^= exp[lg]
        }
    }
}

As one final trick, we can replace p[i] by a range variable. The Go compiler does not yet use loop invariants to eliminate bounds checks, but it does eliminate bounds checks in the implicit indexing done by a range loop.

for i, pi := range p {
    if i == len(m) {
        break
    }
    if pi == 0 {
        continue
    }
    // p -= k·g
    q := p[i+1:]
    exp := f.exp[f.log[pi]:]
    for j, lg := range lgen {
        if lg != 255 {
            q[j] ^= exp[lg]
        }
    }
}

The code is in context in gf256.go.

Summary

We started with single bits 0 and 1. From those we constructed 8-bit polynomials—the elements of GF(28)—with overflow-free, easy-to-implement mathematical operations. From there we moved on to Reed-Solomon coding, which constructs its own polynomials built using elements of GF(28) as coefficients. That is, each Reed-Solomon message is interpreted as a polynomial, and each coefficient in that polynomial is itself a smaller polynomial.

Now that we know how to create Reed-Solomon encodings, the next post will look at some fun we can have with them.

April 10, 2012 06:00 PM

April 05, 2012

Command Center

The byte order fallacy

Whenever I see code that asks what the native byte order is, it's almost certain the code is either wrong or misguided. And if the native byte order really does matter to the execution of the program, it's almost certain to be dealing with some external software that is either wrong or misguided. If your code contains #ifdef BIG_ENDIAN or the equivalent, you need to unlearn about byte order.

The byte order of the computer doesn't matter much at all except to compiler writers and the like, who fuss over allocation of bytes of memory mapped to register pieces. Chances are you're not a compiler writer, so the computer's byte order shouldn't matter to you one bit.

Notice the phrase "computer's byte order". What does matter is the byte order of a peripheral or encoded data stream, but--and this is the key point--the byte order of the computer doing the processing is irrelevant to the processing of the data itself. If the data stream encodes values with byte order B, then the algorithm to decode the value on computer with byte order C should be about B, not about the relationship between B and C.

Let's say your data stream has a little-endian-encoded 32-bit integer. Here's how to extract it (assuming unsigned bytes):
i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
If it's big-endian, here's how to extract it:
i = (data[3]<<0) | (data[2]<<8) | (data[1]<<16) | (data[0]<<24);
Both these snippets work on any machine, independent of the machine's byte order, independent of alignment issues, independent of just about anything. They are totally portable, given unsigned bytes and 32-bit integers.

What you might have expected to see for the little-endian case was something like
i = *((int*)data);
#ifdef BIG_ENDIAN
/* swap the bytes */
i = ((i&0xFF)<<24) | (((i>>8)&0xFF)<<16) | (((i>>16)&0xFF)<<8) | (((i>>24)&0xFF)<<0);
#endif
or something similar. I've seen code like that many times. Why not do it that way? Well, for starters:
  1. It's more code.
  2. It assumes integers are addressable at any byte offset; on some machines that's not true.
  3. It depends on integers being 32 bits long, or requires more #ifdefs to pick a 32-bit integer type.
  4. It may be a little faster on little-endian machines, but not much, and it's slower on big-endian machines.
  5. If you're using a little-endian machine when you write this, there's no way to test the big-endian code.
  6. It swaps the bytes, a sure sign of trouble (see below).

By contrast, my version of the code:
  1. Is shorter.
  2. Does not depend on alignment issues.
  3. Computes a 32-bit integer value regardless of the local size of integers.
  4. Is equally fast regardless of local endianness, and fast enough (especially on modern processsors) anyway.
  5. Runs the same code on all computers: I can state with confidence that if it works on a little-endian machine it will work on a big-endian machine.
  6. Never "byte swaps".
In other words, it's simpler, cleaner, and utterly portable. There is no reason to ask about local byte order when about to interpret an externally provided byte stream.

I've seen programs that end up swapping bytes two, three, even four times as layers of software grapple over byte order. In fact, byte-swapping is the surest indicator the programmer doesn't understand how byte order works.

Why do people make the byte order mistake so often? I think it's because they've seen a lot of bad code that has convinced them byte order matters. "Here comes an encoded byte stream; time for an #ifdef." In fact, C may be part of the problem: in C it's easy to make byte order look like an issue. If instead you try to write byte-order-dependent code in a type-safe language, you'll find it's very hard. In a sense, byte order only bites you when you cheat.

There's plenty of software that demonstrates the byte order fallacy is really a fallacy. The entire Plan 9 system ran, without architecture-dependent #ifdefs of any kind, on dozens of computers of different makes, models, and byte orders. I promise you, your computer's byte order doesn't matter even at the level of the operating system.

And there's plenty of software that demonstrates how easily you can get it wrong. Here's one example. I don't know if it's still true, but some time back Adobe Photoshop screwed up byte order. Back then, Macs were big-endian and PCs, of course, were little-endian. If you wrote a Photoshop file on the Mac and read it back in, it worked. If you wrote it on a PC and tried to read it on a Mac, though, it wouldn't work unless back on the PC you checked a button that said you wanted the file to be readable on a Mac. (Why wouldn't you? Seriously, why wouldn't you?) Ironically, when you read a Mac-written file on a PC, it always worked, which demonstrates that someone at Adobe figured out something about byte order. But there would have been no problems transferring files between machines, and no need for a check box, if the people at Adobe wrote proper code to encode and decode their files, code that could have been identical between the platforms. I guarantee that to get this wrong took far more code than it would have taken to get it right.

Just last week I was reviewing some test code that was checking byte order, and after some discussion it turned out that there was a byte-order-dependency bug in the code being tested. As is often the case, the existence of byte-order-checking was evidence of the presence of a bug. Once the bug was fixed, the test no longer cared about byte order.

And neither should you, because byte order doesn't matter.

by rob (noreply@blogger.com) at April 05, 2012 04:49 AM

April 03, 2012

embrace change

Yesterdays Go talk

Yesterday afternoon I had my talk about Google Go at the GTUG Bremen - which soon btw. will be the GDG Bremen. The audience has been about 25 people, a good number. And the interest has been high, I got many questions during the talk. This interaction is definitely a lot better than just talking to a silent audience.

But there's still room for improvement. The presentation maybe should have been shorter, followed by a practical workshop. I prepared a little toy program that will be published like the slides. I wanted to show how Go looks in reality and to demonstrate, how to implement a wanted feature. But sadly due to the long talk there only has been little room left for that.

The GTUG team recorded the talk on a video that will be published soon and I just will fix some small detected errors on the slides and then publish it too. You'll find a notice here.

by Frank Müller (noreply@blogger.com) at April 03, 2012 02:25 PM

April 02, 2012

embrace change

Quick reminder: Go talk today at GTUG Bremen

Just a quick reminder: Today is the monthly regulars' table of the GTUG Bremen. This time I'll give a talk about Google Go.

by Frank Müller (noreply@blogger.com) at April 02, 2012 08:03 AM

April 01, 2012

embrace change

Common Go Library updated for Go 1

A few days ago the Go team has released Go 1. Congratulations again from my side, what a great work. I've used the last weeks with the both release candidates to apply some last changes to my Tideland Common Go Library. Especially the error handling using the new error type and own implementations of this interface, the flexible logging and the testing have been improved. So today I checked the new release in and tagged it for Go 1.

The packages are:
  • applog: Logging with multiple levels, caller details in debug and critical situations and pluggable own logging behavior.
  • asserts: Tests which can be used inside of unit tests or at runtime.
  • cache: Cache values which will be refreshed on demand.
  • cells: A framework for event and behavior based applications.
  • identifier: Generate UUIDs and other identifiers.
  • mapreduce: Use Googles great algorthim for data processing and aggregating.
  • markup: A simple markup language as an alternative to XML.
  • monitoring: Keep track of the performance, stay-set variables and dynamically retrieved statuses.
  • numerics: Some numerical types, intended to use them e.g. for statistical problems.
  • redis: A powerful client for the Redis database.
  • sort: A parallel quicksort to use the power of multiple cores.
  • state: A generic finite state machine.
  • time: Functions to work with times and an internal crontab server.
  • util: Some more smaller utilities.
  • web: A lean web framework. Especially the combination of REST and JSON is supported.
The next releases will contain a goroutine supervisor like they are inside the Erlang/OTP and an optional web interface for the monitoring package.

Beside that I'm now restart a project I've initially started for the Google App Engine. It's a web application using HTML / CSS / JavaScript / jQuery for the frontend, communicating with a Go backend via HTTP / JSON and using Redis for persistency. The kind of application? That's still a secret. (smile)

by Frank Müller (noreply@blogger.com) at April 01, 2012 08:59 PM

research!rsc

Random Hash Functions

A hash function for a particular hash table should always be deterministic, right? At least, that's what I thought until a few weeks ago, when I was able to fix a performance problem by calling rand inside a hash function.

A hash table is only as good as its hash function, which ideally satisfies two properties for any key pair k1, k2:

  1. If k1 == k2, hash(k1) == hash(k2).
  2. If k1 != k2, it should be likely that hash(k1) != hash(k2).

Normally, following rule 1 would prohibit the use of random bits while computing the hash, because if you pass in the same key again, you'd use different random bits and get a different hash value. That's why the fact that I got to call rand in a hash function is so surprising.

If the hash function violates rule 1, your hash table just breaks: you can't find things you put in, because you are looking in the wrong places. If the hash function satisfies rule 1 but violates rule 2 (for example, “return 42”), the hash table will be slow due to the large number of hash collisions. You'll still be able to find the things you put in, but you might as well be using a list.

The phrasing of rule 1 is very important. It is not sufficient to say simply “hash(k1) == hash(k1)”, because that does not take into account the definition of equality of keys. If you are building a hash table with case-insensitive, case-preserving string keys, then “HELLO” and “hello” need to hash to the same value. In fact, “hash(k1) == hash(k1)” is not even strictly necessary. How could it not be necessary? By reversing rule 1, hash(k1) and hash(k1) can be unequal if k1 != k1, that is, if k1 does not equal itself.

How can that happen? It happens if k1 is the floating-point value NaN (not-a-number), which by convention is not equal to anything, not even itself.

Okay, but why bother? Well, remember rule 2. Since NaN != NaN, it should be likely that hash(NaN) != hash(NaN), or else the hash table will have bad performance. This is very strange: the same input is hashed twice, and we're supposed to (at least be likely to) return different hash values. Since the inputs are identical, we need a source of external entropy, like rand.

What if you don't? You get hash tables that don't perform very well if someone can manage to trick you into storing things under NaN repeatedly:

$ cat nan.py
#!/usr/bin/python
import timeit
def build(n):
	m = {}
	for i in range(n):
		m[float("nan")] = 1
n = 1
for i in range(20):
	print "%6d %10.6f" % (n, timeit.timeit('build('+str(n)+')',
	    'from __main__ import build', number=1))
	n *= 2

$ python nan.py
     1   0.000006
     2   0.000004
     4   0.000004
     8   0.000008
    16   0.000011
    32   0.000028
    64   0.000072
   128   0.000239
   256   0.000840
   512   0.003339
  1024   0.012612
  2048   0.050331
  4096   0.200965
  8192   1.032596
 16384   4.657481
 32768  22.758963
 65536  91.899054
$

The behavior here is quadratic: double the input size and the run time quadruples. You can run the equivalent Go program on the Go playground. It has the NaN fix and runs in linear time. (On the playground, wall time stands still, but you can see that it's executing in far less than 100s of seconds. Run it locally for actual timing.)

Now, you could argue that putting a NaN in a hash table is a dumb idea, and also that treating NaN != NaN in a hash table is also a dumb idea, and you'd be right on both counts.

But the alternatives are worse:

  • If you define that NaN is equal to itself during hash key comparisons, now you have a second parallel definition of equality, to handle NaNs inside structs and so on, only used for map lookups. Languages typically have too many equality operators anyway; introducing a new one for this special case seems unwise.
  • If you define that NaN cannot appear as a hash table key, then you have a similar problem: you need to build up logic to test for invalid keys such as NaNs inside structs or arrays, and then you have to deal with the fact that your hash table might return an error or throw an exception when inserting values under certain keys.

The most consistent thing to do is to accept the implications of NaN != NaN: m[NaN] = 1 always creates a new hash table element (since the key is unequal to any existing entry), reading m[NaN] never finds any data (same reason), and iterating over the hash table yields each of the inserted NaN entries.

Behaviors surrounding NaN are always surprising, but if NaN != NaN elsewhere, the least surprising thing you can do is make your hash tables respect that. To do that well, it needs to be likely that hash(NaN) != hash(NaN). And you probably already have a custom floating-point hash function so that +0 and −0 are treated as the same value. Go ahead, call rand for NaN.

(Note: this is different from the hash table performance problem that was circulating in December 2011. In that case, the predictability of collisions on ordinary data was solved by making each different table use a randomly chosen hash function; there's no randomness inside the function itself.)

April 01, 2012 07:00 PM

March 29, 2012

Sonia Codes

Go 1 release

The news, if you hadn’t heard. :)  It’s a big thing with me because I’ve put so much time into using it—well, mostly playing with it.  I do write and maintain some Go programs for work, but most of my time with Go has been spent contributing solutions to Rosetta Code.  I’ve contributed hundreds of Go solutions over the last year or so, and Go has been in the top 10 languages there for some time now.  Recently I’ve tried to review all of the solutions to update them to Go 1, and I’m happy to announce that they (almost) all work with Go 1 now.

I wish I could announce that it represents a large body of idiomatic Go code, available as reference solutions common programming tasks, but I’m afraid it’s not.  Firstly, I’m not sure how idiomatic my code is.  I’ve worked mostly in isolation, and my code almost certainly displays my idiosyncrasies as much as it displays accepted idioms.  Secondly of course, is the nature of of the RC site, which is more a wonderland than reference library.

Still, I’d like to invite people to come and visit.  Browse the existing Go solutions and be amused or enlightened, hopefully both.  But then I’d really like to encourage people to make improvements as they can.  That’s the nature of a wiki, and as I said, since such a large fraction of the Go code is mine, there is certainly much room for the code to be improved.

Next there is the list of existing RC tasks with no Go solution yet.  Most of these are tasks that awkward for me because I don’t have a computer, and can’t freely do things like install software on the computers I do have access to.  If you have or can install up-to-date libraries and software on your computer, some of these tasks should be easy!  Please browse the list and contribute any solutions you can.  A few of them are simply tasks for which I don’t have the knowledge.  For example the HTTP task seemed simple enough, but I don’t know enough to do the HTTPS task.

Finally, if you really enjoy the site, consider contributing a new task that you feel really shows off some feature of Go.  I’ve added just a few, but probably still be designed are some great tasks illuminating interfaces, concurrency, and the varied capabilities of the Go standard package library.


by Sonia at March 29, 2012 12:04 AM

March 28, 2012

RSC

_rsc: go1 has been my goto language for a while now. - dlsspy on hacker news #golang

_rsc: go1 has been my goto language for a while now. - dlsspy on hacker news #golang

March 28, 2012 05:21 PM

Go's official blog

Go version 1 is released

Today marks a major milestone in the development of the Go programming language. We're announcing Go version 1, or Go 1 for short, which defines a language and a set of core libraries to provide a stable foundation for creating reliable products, projects, and publications.
Go 1 is the first release of Go that is available in supported binary distributions. They are available for Linux, FreeBSD, Mac OS X and, we are thrilled to announce, Windows.

The driving motivation for Go 1 is stability for its users. People who write Go 1 programs can be confident that those programs will continue to compile and run without change, in many environments, on a time scale of years. Similarly, authors who write books about Go 1 can be sure that their examples and explanations will be helpful to readers today and into the future.

Forward compatibility is part of stability. Code that compiles in Go 1 should, with few exceptions, continue to compile and run throughout the lifetime of that version, even as we issue updates and bug fixes such as Go version 1.1, 1.2, and so on. The Go 1 compatibility document explains the compatibility guidelines in more detail.

Go 1 is a representation of Go as it is used today, not a major redesign. In its planning, we focused on cleaning up problems and inconsistencies and improving portability. There had long been many changes to Go that we had designed and prototyped but not released because they were backwards-incompatible. Go 1 incorporates these changes, which provide significant improvements to the language and libraries but sometimes introduce incompatibilities for old programs. Fortunately, the go fix tool can automate much of the work needed to bring programs up to the Go 1 standard.

Go 1 introduces changes to the language (such as new types for Unicode characters and errors) and the standard library (such as the new time package and renamings in the strconv package). Also, the package hierarchy has been rearranged to group related items together, such as moving the networking facilities, for instance the rpc package, into subdirectories of net. A complete list of changes is documented in the Go 1 release notes. That document is an essential reference for programmers migrating code from earlier versions of Go.

We also restructured the Go tool suite around the new go command, a program for fetching, building, installing and maintaining Go code. The go command eliminates the need for Makefiles to write Go code because it uses the Go program source itself to derive the build instructions. No more build scripts!

Finally, the release of Go 1 triggers a new release of the Google App Engine SDK. A similar process of revision and stabilization has been applied to the App Engine libraries, providing a base for developers to build programs for App Engine that will run for years.

Go 1 is the result of a major effort by the core Go team and our many contributors from the open source community. We thank everyone who helped make this happen.

There has never been a better time to be a Go programmer. Everything you need to get started is at golang.org.

by Andrew Gerrand (noreply@blogger.com) at March 28, 2012 04:32 PM

The Go image package

The image and image/color packages define a number of types: color.Color and color.Model describe colors, image.Point and image.Rectangle describe basic 2-D geometry, and image.Image brings the two concepts together to represent a rectangular grid of colors. A separate article covers image composition with the image/draw package.

Colors and Color Models

Color is an interface that defines the minimal method set of any type that can be considered a color: one that can be converted to red, green, blue and alpha values. The conversion may be lossy, such as converting from CMYK or YCbCr color spaces.

type Color interface {
// RGBA returns the alpha-premultiplied red, green, blue and alpha values
// for the color. Each value ranges within [0, 0xFFFF], but is represented
// by a uint32 so that multiplying by a blend factor up to 0xFFFF will not
// overflow.
RGBA() (r, g, b, a uint32)
}

There are three important subtleties about the return values. First, the red, green and blue are alpha-premultiplied: a fully saturated red that is also 25% transparent is represented by RGBA returning a 75% r. Second, the channels have a 16-bit effective range: 100% red is represented by RGBA returning an r of 65535, not 255, so that converting from CMYK or YCbCr is not as lossy. Third, the type returned is uint32, even though the maximum value is 65535, to guarantee that multiplying two values together won't overflow. Such multiplications occur when blending two colors according to an alpha mask from a third color, in the style of Porter and Duff'sclassic algebra:


dstr, dstg, dstb, dsta := dst.RGBA()
srcr, srcg, srcb, srca := src.RGBA()
_, _, _, m := mask.RGBA()
const M = 1<<16 - 1
// The resultant red value is a blend of dstr and srcr, and ranges in [0, M].
// The calculation for green, blue and alpha is similar.
dstr = (dstr*(M-m) + srcr*m) / M

The last line of that code snippet would have been more complicated if we worked with non-alpha-premultiplied colors, which is why Color uses alpha-premultiplied values.

The image/color package also defines a number of concrete types that implement the Color interface. For example, RGBA is a struct that represents the classic "8 bits per channel" color.

type RGBA struct {
R, G, B, A uint8
}

Note that the R field of an RGBA is an 8-bit alpha-premultiplied color in the range [0, 255]. RGBA satisfies the Color interface by multiplying that value by 0x101 to generate a 16-bit alpha-premultiplied color in the range [0, 65535]. Similarly, the NRGBA struct type represents an 8-bit non-alpha-premultiplied color, as used by the PNG image format. When manipulating an NRGBA's fields directly, the values are non-alpha-premultiplied, but when calling the RGBA method, the return values are alpha-premultiplied.

A Model is simply something that can convert Colors to other Colors, possibly lossily. For example, the GrayModel can convert any Color to a desaturated Gray. A Palette can convert any Color to one from a limited palette.

type Model interface {
Convert(c Color) Color
}
type Palette []Color

Points and Rectangles

A Point is an (x, y) co-ordinate on the integer grid, with axes increasing right and down. It is neither a pixel nor a grid square. A Point has no intrinsic width, height or color, but the visualizations below use a small colored square.

type Point struct {
X, Y int
}

    p := image.Point{2, 1}

A Rectangle is an axis-aligned rectangle on the integer grid, defined by its top-left and bottom-right Point. A Rectangle also has no intrinsic color, but the visualizations below outline rectangles with a thin colored line, and call out their Min and Max Points.

type Rectangle struct {
Min, Max Point
}

For convenience, image.Rect(x0, y0, x1, y1) is equivalent to image.Rectangle{image.Point{x0, y0}, image.Point{x1, y1}}, but is much easier to type.

A Rectangle is inclusive at the top-left and exclusive at the bottom-right. For a Point p and a Rectangle r, p.In(r) if and only if r.Min.X <= p.X && p.X < r.Max.X, and similarly for Y. This is analagous to how a slice s[i0:i1] is inclusive at the low end and exclusive at the high end. (Unlike arrays and slices, a Rectangle often has a non-zero origin.)

    r := image.Rect(2, 1, 5, 5)
// Dx and Dy return a rectangle's width and height.
fmt.Println(r.Dx(), r.Dy(), image.Pt(0, 0).In(r)) // prints 3 4 false

Adding a Point to a Rectangle translates the Rectangle. Points and Rectangles are not restricted to be in the bottom-right quadrant.

    r := image.Rect(2, 1, 5, 5).Add(image.Pt(-4, -2))
fmt.Println(r.Dx(), r.Dy(), image.Pt(0, 0).In(r)) // prints 3 4 true

Intersecting two Rectangles yields another Rectangle, which may be empty.

    r := image.Rect(0, 0, 4, 3).Intersect(image.Rect(2, 2, 5, 5))
// Size returns a rectangle's width and height, as a Point.
fmt.Printf("%#v\n", r.Size()) // prints image.Point{X:2, Y:1}

Points and Rectangles are passed and returned by value. A function that takes a Rectangle argument will be as efficient as a function that takes two Point arguments, or four int arguments.

Images

An Image maps every grid square in a Rectangle to a Color from a Model. "The pixel at (x, y)" refers to the color of the grid square defined by the points (x, y), (x+1, y), (x+1, y+1) and (x, y+1).

type Image interface {
// ColorModel returns the Image's color model.
ColorModel() color.Model
// Bounds returns the domain for which At can return non-zero color.
// The bounds do not necessarily contain the point (0, 0).
Bounds() Rectangle
// At returns the color of the pixel at (x, y).
// At(Bounds().Min.X, Bounds().Min.Y) returns the upper-left pixel of the grid.
// At(Bounds().Max.X-1, Bounds().Max.Y-1) returns the lower-right one.
At(x, y int) color.Color
}

A common mistake is assuming that an Image's bounds start at (0, 0). For example, an animated GIF contains a sequence of Images, and each Image after the first typically only holds pixel data for the area that changed, and that area doesn't necessarily start at (0, 0). The correct way to iterate over an Image m's pixels looks like:


b := m.Bounds()
for y := b.Min.Y; y < b.Max.Y; y++ {
for x := b.Min.X; y < b.Max.X; x++ {
doStuffWith(m.At(x, y))
}
}

Image implementations do not have to be based on an in-memory slice of pixel data. For example, a Uniform is an Image of enormous bounds and uniform color, whose in-memory representation is simply that color.

type Uniform struct {
C color.Color
}

Typically, though, programs will want an image based on a slice. Struct types like RGBA and Gray (which other packages refer to as image.RGBA and image.Gray) hold slices of pixel data and implement the Image interface.

type RGBA struct {
// Pix holds the image's pixels, in R, G, B, A order. The pixel at
// (x, y) starts at Pix[(y-Rect.Min.Y)*Stride + (x-Rect.Min.X)*4].
Pix []uint8
// Stride is the Pix stride (in bytes) between vertically adjacent pixels.
Stride int
// Rect is the image's bounds.
Rect Rectangle
}

These types also provide a Set(x, y int, c color.Color) method that allows modifying the image one pixel at a time.

    m := image.NewRGBA(image.Rect(0, 0, 640, 480))
m.Set(5, 5, color.RGBA{255, 0, 0, 255})

If you're reading or writing a lot of pixel data, it can be more efficient, but more complicated, to access these struct type's Pix field directly.

The slice-based Image implementations also provide a SubImage method, which returns an Image backed by the same array. Modifying the pixels of a sub-image will affect the pixels of the original image, analagous to how modifying the contents of a sub-slice s[i0:i1] will affect the contents of the original slice s.

    m0 := image.NewRGBA(image.Rect(0, 0, 8, 5))
m1 := m0.SubImage(image.Rect(1, 2, 5, 5)).(*image.RGBA)
fmt.Println(m0.Bounds().Dx(), m1.Bounds().Dx()) // prints 8, 4
fmt.Println(m0.Stride == m1.Stride) // prints true

For low-level code that works on an image's Pix field, be aware that ranging over Pix can affect pixels outside an image's bounds. In the example above, the pixels covered by m1.Pix are shaded in blue. Higher-level code, such as the At and Setmethods or the image/draw package, will clip their operations to the image's bounds.

Image Formats

The standard package library supports a number of common image formats, such as GIF, JPEG and PNG. If you know the format of a source image file, you can decode from an io.Reader directly.


import (
"image/jpeg"
"image/png"
"io"
)

// convertJPEGToPNG converts from JPEG to PNG.
func convertJPEGToPNG(w io.Writer, r io.Reader) error {
img, err := jpeg.Decode(r)
if err != nil {
return err
}
return png.Encode(w, img)
}

If you have image data of unknown format, the image.Decode function can detect the format. The set of recognized formats is constructed at run time and is not limited to those in the standard package library. An image format package typically registers its format in an init function, and the main package will "underscore import" such a package solely for the side effect of format registration.


import (
"image"
"image/png"
"io"

_ "code.google.com/p/vp8-go/webp"
_ "image/jpeg"
)

// convertToPNG converts from any recognized format to PNG.
func convertToPNG(w io.Writer, r io.Reader) error {
img, _, err := image.Decode(r)
if err != nil {
return err
}
return png.Encode(w, img)
}

by Andrew Gerrand (noreply@blogger.com) at March 28, 2012 03:31 AM

March 27, 2012

Go's official blog

The Go image/draw package

Package image/draw defines only one operation: drawing a source image onto a destination image, through an optional mask image. This one operation is surprisingly versatile and can perform a number of common image manipulation tasks elegantly and efficiently.

Composition is performed pixel by pixel in the style of the Plan 9 graphics library and the X Render extension. The model is based on the classic "Compositing Digital Images" paper by Porter and Duff, with an additional mask parameter: dst = (src IN mask) OP dst. For a fully opaque mask, this reduces to the original Porter-Duff formula: dst = src OP dst. In Go, a nil mask image is equivalent to an infinitely sized, fully opaque mask image.

The Porter-Duff paper presented 12 different composition operators, but with an explicit mask, only 2 of these are needed in practice: source-over-destination and source. In Go, these operators are represented by the Over and Src constants. The Over operator performs the natural layering of a source image over a destination image: the change to the destination image is smaller where the source (after masking) is more transparent (that is, has lower alpha). The Src operator merely copies the source (after masking) with no regard for the destination image's original content. For fully opaque source and mask images, the two operators produce the same output, but the Src operator is usually faster.

Geometric Alignment

Composition requires associating destination pixels with source and mask pixels. Obviously, this requires destination, source and mask images, and a composition operator, but it also requires specifying what rectangle of each image to use. Not every drawing should write to the entire destination: when updating an animating image, it is more efficient to only draw the parts of the image that have changed. Not every drawing should read from the entire source: when using a sprite that combines many small images into one large one, only a part of the image is needed. Not every drawing should read from the entire mask: a mask image that collects a font's glyphs is similar to a sprite. Thus, drawing also needs to know three rectangles, one for each image. Since each rectangle has the same width and height, it suffices to pass a destination rectangle `r` and two points sp and mp: the source rectangle is equal to rtranslated so that r.Min in the destination image aligns with sp in the source image, and similarly for mp. The effective rectangle is also clipped to each image's bounds in their respective co-ordinate space.

The DrawMaskfunction takes seven arguments, but an explicit mask and mask-point are usually unnecessary, so the Draw function takes five:


// Draw calls DrawMask with a nil mask.
func Draw(dst Image, r image.Rectangle, src image.Image, sp image.Point, op Op)
func DrawMask(dst Image, r image.Rectangle, src image.Image, sp image.Point,
mask image.Image, mp image.Point, op Op)

The destination image must be mutable, so the image/draw package defines a draw.Imageinterface which has a Set method.

type Image interface {
image.Image
Set(x, y int, c color.Color)
}

Filling a Rectangle

To fill a rectangle with a solid color, use an image.Uniformsource. The ColorImage type re-interprets a Color as a practically infinite-sized Image of that color. For those familiar with the design of Plan 9's draw library, there is no need for an explicit "repeat bit" in Go's slice-based image types; the concept is subsumed by Uniform.

    // image.ZP is the zero point -- the origin.
draw.Draw(dst, r, &image.Uniform{c}, image.ZP, draw.Src)

To initialize a new image to all-blue:

    m := image.NewRGBA(image.Rect(0, 0, 640, 480))
blue := color.RGBA{0, 0, 255, 255}
draw.Draw(m, m.Bounds(), &image.Uniform{blue}, image.ZP, draw.Src)

To reset an image to transparent (or black, if the destination image's color model cannot represent transparency), use image.Transparent, which is an image.Uniform:

    draw.Draw(m, m.Bounds(), image.Transparent, image.ZP, draw.Src)

Copying an Image

To copy from a rectangle sr in the source image to a rectangle starting at a point dp in the destination, convert the source rectangle into the destination image's co-ordinate space:

    r := image.Rectangle{dp, dp.Add(sr.Size())}
draw.Draw(dst, r, src, sr.Min, draw.Src)

Alternatively:

    r := sr.Sub(sr.Min).Add(dp)
draw.Draw(dst, r, src, sr.Min, draw.Src)

To copy the entire source image, use sr = src.Bounds().

Scrolling an Image

Scrolling an image is just copying an image to itself, with different destination and source rectangles. Overlapping destination and source images are perfectly valid, just as Go's built-in copy function can handle overlapping destination and source slices. To scroll an image m by 20 pixels:

    b := m.Bounds()
p := image.Pt(0, 20)
// Note that even though the second argument is b,
// the effective rectangle is smaller due to clipping.
draw.Draw(m, b, m, b.Min.Add(p), draw.Src)
dirtyRect := b.Intersect(image.Rect(b.Min.X, b.Max.Y-20, b.Max.X, b.Max.Y))

Converting an Image to RGBA

The result of decoding an image format might not be an image.RGBA: decoding a GIF results in an image.Paletted, decoding a JPEG results in a ycbcr.YCbCr, and the result of decoding a PNG depends on the image data. To convert any image to an image.RGBA:

    b := src.Bounds()
m := image.NewRGBA(image.Rect(0, 0, b.Dx(), b.Dy()))
draw.Draw(m, m.Bounds(), src, b.Min, draw.Src)

Drawing Through a Mask

To draw an image through a circular mask with center p and radius r:

type circle struct {
p image.Point
r int
}

func (c *circle) ColorModel() color.Model {
return color.AlphaModel
}

func (c *circle) Bounds() image.Rectangle {
return image.Rect(c.p.X-c.r, c.p.Y-c.r, c.p.X+c.r, c.p.Y+c.r)
}

func (c *circle) At(x, y int) color.Color {
xx, yy, rr := float64(x-c.p.X)+0.5, float64(y-c.p.Y)+0.5, float64(c.r)
if xx*xx+yy*yy < rr*rr {
return color.Alpha{255}
}
return color.Alpha{0}
}
    draw.DrawMask(dst, dst.Bounds(), src, image.ZP, &circle{p, r}, image.ZP, draw.Over)

Drawing Font Glyphs

To draw a font glyph in blue starting from a point p, draw with an image.ColorImage source and an image.Alpha mask. For simplicity, we aren't performing any sub-pixel positioning or rendering, or correcting for a font's height above a baseline.

    src := &image.Uniform{color.RGBA{0, 0, 255, 255}}
mask := theGlyphImageForAFont()
mr := theBoundsFor(glyphIndex)
draw.DrawMask(dst, mr.Sub(mr.Min).Add(p), src, image.ZP, mask, mr.Min, draw.Over)

Performance

The image/draw package implementation demonstrates how to provide an image manipulation function that is both general purpose, yet efficient for common cases. The DrawMask function takes arguments of interface types, but immediately makes type assertions that its arguments are of specific struct types, corresponding to common operations like drawing one image.RGBA image onto another, or drawing an image.Alpha mask (such as a font glyph) onto an image.RGBA image. If a type assertion succeeds, that type information is used to run a specialized implementation of the general algorithm. If the assertions fail, the fallback code path uses the generic At and Set methods. The fast-paths are purely a performance optimization; the resultant destination image is the same either way. In practice, only a small number of special cases are necessary to support typical applications.

by Nigel Tao (noreply@blogger.com) at March 27, 2012 07:37 AM

Godoc: documenting Go code

The Go project takes documentation seriously. Documentation is a huge part of making software accessible and maintainable. Of course it must be well-written and accurate, but it also must be easy to write and to maintain. Ideally, it should be coupled to the code itself so the documentation evolves along with the code. The easier it is for programmers to produce good documentation, the better for everyone.

To that end, we have developed the godoc documentation tool. This article describes godoc's approach to documentation, and explains how you can use our conventions and tools to write good documentation for your own projects.

Godoc parses Go source code - including comments - and produces documentation as HTML or plain text. The end result is documentation tightly coupled with the code it documents. For example, through godoc's web interface you can navigate from a function's documentation to its implementation with one click.

Godoc is conceptually related to Python's Docstring and Java's Javadoc, but its design is simpler. The comments read by godoc are not language constructs (as with Docstring) nor must they have their own machine-readable syntax (as with Javadoc). Godoc comments are just good comments, the sort you would want to read even if godoc didn't exist.

The convention is simple: to document a type, variable, constant, function, or even a package, write a regular comment directly preceding its declaration, with no intervening blank line. Godoc will then present that comment as text alongside the item it documents. For example, this is the documentation for the fmt package's Fprintfunction:

// Fprint formats using the default formats for its operands and writes to w.
// Spaces are added between operands when neither is a string.
// It returns the number of bytes written and any write error encountered.
func Fprint(w io.Writer, a ...interface{}) (n int, err error) {

Notice this comment is a complete sentence that begins with the name of the element it describes. This important convention allows us to generate documentation in a variety of formats, from plain text to HTML to UNIX man pages, and makes it read better when tools truncate it for brevity, such as when they extract the first line or sentence.

Comments on package declarations should provide general package documentation. These comments can be short, like the sortpackage's brief description:

// Package sort provides primitives for sorting slices and user-defined
// collections.
package sort

They can also be detailed like the gob package's overview. That package uses another convention for packages that need large amounts of introductory documentation: the package comment is placed in its own file, doc.go, which contains only those comments and a package clause.

When writing package comments of any size, keep in mind that their first sentence will appear in godoc's package list.

Comments that are not adjacent to a top-level declaration are omitted from godoc's output, with one notable exception. Top-level comments that begin with the word "BUG(who)” are recognized as known bugs, and included in the "Bugs” section of the package documentation. The "who” part should be the user name of someone who could provide more information. For example, this is a known issue from the bytes package:


// BUG(r): The rule Title uses for word boundaries does not handle Unicode punctuation properly.

Godoc treats executable commands somewhat differently. Instead of inspecting the command source code, it looks for a Go source file belonging to the special package "documentation”. The comment on the "package documentation” clause is used as the command's documentation. For example, see the godoc documentation and its corresponding doc.go file.

There are a few formatting rules that Godoc uses when converting comments to HTML:

  • Subsequent lines of text are considered part of the same paragraph; you must leave a blank line to separate paragraphs.
  • Pre-formatted text must be indented relative to the surrounding comment text (see gob's doc.go for an example).
  • URLs will be converted to HTML links; no special markup is necessary.

Note that none of these rules requires you to do anything out of the ordinary.

In fact, the best thing about godoc's minimal approach is how easy it is to use. As a result, a lot of Go code, including all of the standard library, already follows the conventions.

Your own code can present good documentation just by having comments as described above. Any Go packages installed inside $GOROOT/src/pkgand any GOPATH work spaces will already be accessible via godoc's command-line and HTTP interfaces, and you can specify additional paths for indexing via the -path flag or just by running "godoc ."in the source directory. See the godoc documentationfor more details.

by Andrew Gerrand (noreply@blogger.com) at March 27, 2012 07:31 AM

The Laws of Reflection

Reflection in computing is the ability of a program to examine its own structure, particularly through types; it's a form of metaprogramming. It's also a great source of confusion.

In this article we attempt to clarify things by explaining how reflection works in Go. Each language's reflection model is different (and many languages don't support it at all), but this article is about Go, so for the rest of this article the word "reflection" should be taken to mean "reflection in Go".

Types and interfaces

Because reflection builds on the type system, let's start with a refresher about types in Go.

Go is statically typed. Every variable has a static type, that is, exactly one type known and fixed at compile time: int, float32, *MyType, []byte, and so on. If we declare

type MyInt int

var i int
var j MyInt

then i has type int and jhas type MyInt. The variables i and j have distinct static types and, although they have the same underlying type, they cannot be assigned to one another without a conversion.

One important category of type is interface types, which represent fixed sets of methods. An interface variable can store any concrete (non-interface) value as long as that value implements the interface's methods. A well-known pair of examples is io.Reader and io.Writer, the types Reader and Writer from the io package:

// Reader is the interface that wraps the basic Read method.
type Reader interface {
Read(p []byte) (n int, err error)
}

// Writer is the interface that wraps the basic Write method.
type Writer interface {
Write(p []byte) (n int, err error)
}

Any type that implements a Read (or Write) method with this signature is said to implement io.Reader (or io.Writer). For the purposes of this discussion, that means that a variable of type io.Reader can hold any value whose type has a Read method:

    var r io.Reader
r = os.Stdin
r = bufio.NewReader(r)
r = new(bytes.Buffer)
// and so on

It's important to be clear that whatever concrete value r may hold, r's type is always io.Reader: Go is statically typed and the static type of r is io.Reader.

An extremely important example of an interface type is the empty interface:


interface{}

It represents the empty set of methods and is satisfied by any value at all, since any value has zero or more methods.

Some people say that Go's interfaces are dynamically typed, but that is misleading. They are statically typed: a variable of interface type always has the same static type, and even though at run time the value stored in the interface variable may change type, that value will always satisfy the interface.

We need to be precise about all this because reflection and interfaces are closely related.

The representation of an interface

Russ Cox has written a detailed blog post about the representation of interface values in Go. It's not necessary to repeat the full story here, but a simplified summary is in order.

A variable of interface type stores a pair: the concrete value assigned to the variable, and that value's type descriptor. To be more precise, the value is the underlying concrete data item that implements the interface and the type describes the full type of that item. For instance, after

    var r io.Reader
tty, err := os.OpenFile("/dev/tty", os.O_RDWR, 0)
if err != nil {
return nil, err
}
r = tty

r contains, schematically, the (value, type) pair, (tty, *os.File). Notice that the type *os.File implements methods other than Read; even though the interface value provides access only to the Read method, the value inside carries all the type information about that value. That's why we can do things like this:

    var w io.Writer
w = r.(io.Writer)

The expression in this assignment is a type assertion; what it asserts is that the item inside r also implements io.Writer, and so we can assign it to w. After the assignment, w will contain the pair (tty, *os.File). That's the same pair as was held in r. The static type of the interface determines what methods may be invoked with an interface variable, even though the concrete value inside may have a larger set of methods.

Continuing, we can do this:

    var empty interface{}
empty = w

and our empty interface value e will again contain that same pair, (tty, *os.File). That's handy: an empty interface can hold any value and contains all the information we could ever need about that value.

(We don't need a type assertion here because it's known statically that w satisfies the empty interface. In the example where we moved a value from a Reader to a Writer, we needed to be explicit and use a type assertion because Writer's methods are not a subset of Reader's.)

One important detail is that the pair inside an interface always has the form (value, concrete type) and cannot have the form (value, interface type). Interfaces do not hold interface values.

Now we're ready to reflect.

The first law of reflection

1. Reflection goes from interface value to reflection object.

At the basic level, reflection is just a mechanism to examine the type and value pair stored inside an interface variable. To get started, there are two types we need to know about in package reflect: Type and Value. Those two types give access to the contents of an interface variable, and two simple functions, called reflect.TypeOf and reflect.ValueOf, retrieve reflect.Typeand reflect.Value pieces out of an interface value. (Also, from the reflect.Value it's easy to get to the reflect.Type, but let's keep the Value and Type concepts separate for now.)

Let's start with TypeOf:

package main

import (
"fmt"
"reflect"
)

func main() {
var x float64 = 3.4
fmt.Println("type:", reflect.TypeOf(x))
}

This program prints


type: float64

You might be wondering where the interface is here, since the program looks like it's passing the float64variable x, not an interface value, to reflect.TypeOf. But it's there; as godoc reports, the signature of reflect.TypeOf includes an empty interface:


// TypeOf returns the reflection Type of the value in the interface{}.
func TypeOf(i interface{}) Type

When we call reflect.TypeOf(x), x is first stored in an empty interface, which is then passed as the argument; reflect.TypeOf unpacks that empty interface to recover the type information.

The reflect.ValueOf function, of course, recovers the value (from here on we'll elide the boilerplate and focus just on the executable code):

    var x float64 = 3.4
fmt.Println("value:", reflect.ValueOf(x))

prints


value: <float64 Value>

Both reflect.Type and reflect.Value have lots of methods to let us examine and manipulate them. One important example is that Value has a Type method that returns the Type of a reflect.Value. Another is that both Typeand Value have a Kind method that returns a constant indicating what sort of item is stored: Uint, Float64, Slice, and so on. Also methods on Value with names like Int and Float let us grab values (as int64 and float64) stored inside:

    var x float64 = 3.4
v := reflect.ValueOf(x)
fmt.Println("type:", v.Type())
fmt.Println("kind is float64:", v.Kind() == reflect.Float64)
fmt.Println("value:", v.Float())

prints


type: float64
kind is float64: true
value: 3.4

There are also methods like SetInt and SetFloat but to use them we need to understand settability, the subject of the third law of reflection, discussed below.

The reflection library has a couple of properties worth singling out. First, to keep the API simple, the "getter" and "setter" methods of Value operate on the largest type that can hold the value: int64 for all the signed integers, for instance. That is, the Int method of Value returns an int64 and the SetInt value takes an int64; it may be necessary to convert to the actual type involved:

    var x uint8 = 'x'
v := reflect.ValueOf(x)
fmt.Println("type:", v.Type()) // uint8.
fmt.Println("kind is uint8: ", v.Kind() == reflect.Uint8) // true.
x = uint8(v.Uint()) // v.Uint returns a uint64.

The second property is that the Kind of a reflection object describes the underlying type, not the static type. If a reflection object contains a value of a user-defined integer type, as in

    type MyInt int
var x MyInt = 7
v := reflect.ValueOf(x)

the Kind of v is still reflect.Int, even though the static type of x is MyInt, not int. In other words, the Kind cannot discriminate an int from a MyInt even though the Type can.

The second law of reflection

2. Reflection goes from reflection object to interface value.

Like physical reflection, reflection in Go generates its own inverse.

Given a reflect.Value we can recover an interface value using the Interface method; in effect the method packs the type and value information back into an interface representation and returns the result:


// Interface returns v's value as an interface{}.
func (v Value) Interface() interface{}

As a consequence we can say

    y := v.Interface().(float64) // y will have type float64.
fmt.Println(y)

to print the float64 value represented by the reflection object v.

We can do even better, though. The arguments to fmt.Println, fmt.Printf and so on are all passed as empty interface values, which are then unpacked by the fmt package internally just as we have been doing in the previous examples. Therefore all it takes to print the contents of a reflect.Value correctly is to pass the result of the Interface method to the formatted print routine:

    fmt.Println(v.Interface())

(Why not fmt.Println(v)? Because v is a reflect.Value; we want the concrete value it holds.) Since our value is a float64, we can even use a floating-point format if we want:

    fmt.Printf("value is %7.1e\n", v.Interface())

and get in this case


3.4e+00

Again, there's no need to type-assert the result of v.Interface() to float64; the empty interface value has the concrete value's type information inside and Printf will recover it.

In short, the Interface method is the inverse of the ValueOf function, except that its result is always of static type interface{}.

Reiterating: Reflection goes from interface values to reflection objects and back again.

The third law of reflection

3. To modify a reflection object, the value must be settable.

The third law is the most subtle and confusing, but it's easy enough to understand if we start from first principles.

Here is some code that does not work, but is worth studying.

    var x float64 = 3.4
v := reflect.ValueOf(x)
v.SetFloat(7.1) // Error: will panic.

If you run this code, it will panic with the cryptic message


panic: reflect.Value.SetFloat using unaddressable value

The problem is not that the value 7.1 is not addressable; it's that v is not settable. Settability is a property of a reflection Value, and not all reflection Values have it.

The CanSet method of Value reports the settability of a Value; in our case,

    var x float64 = 3.4
v := reflect.ValueOf(x)
fmt.Println("settability of v:", v.CanSet())

prints


settability of v: false

It is an error to call a Set method on an non-settable Value. But what is settability?

Settability is a bit like addressability, but stricter. It's the property that a reflection object can modify the actual storage that was used to create the reflection object. Settability is determined by whether the reflection object holds the original item. When we say

    var x float64 = 3.4
v := reflect.ValueOf(x)

we pass a copy of x to reflect.ValueOf, so the interface value created as the argument to reflect.ValueOf is a copy of x, not x itself. Thus, if the statement

    v.SetFloat(7.1)

were allowed to succeed, it would not update x, even though v looks like it was created from x. Instead, it would update the copy of xstored inside the reflection value and x itself would be unaffected. That would be confusing and useless, so it is illegal, and settability is the property used to avoid this issue.

If this seems bizarre, it's not. It's actually a familiar situation in unusual garb. Think of passing x to a function:


f(x)

We would not expect f to be able to modify x because we passed a copy of x's value, not x itself. If we want f to modify x directly we must pass our function the address of x (that is, a pointer to x):

f(&x)

This is straightforward and familiar, and reflection works the same way. If we want to modify x by reflection, we must give the reflection library a pointer to the value we want to modify.

Let's do that. First we initialize x as usual and then create a reflection value that points to it, called p.

    var x float64 = 3.4
p := reflect.ValueOf(&x) // Note: take the address of x.
fmt.Println("type of p:", p.Type())
fmt.Println("settability of p:", p.CanSet())

The output so far is


type of p: *float64
settability of p: false

The reflection object p isn't settable, but it's not p we want to set, it's (in effect) *p. To get to what p points to, we call the Elemmethod of Value, which indirects through the pointer, and save the result in a reflection Value called v:

    v := p.Elem()
fmt.Println("settability of v:", v.CanSet())

Now v is a settable reflection object, as the output demonstrates,


settability of v: true

and since it represents x, we are finally able to use v.SetFloat to modify the value of x:

    v.SetFloat(7.1)
fmt.Println(v.Interface())
fmt.Println(x)

The output, as expected, is


7.1
7.1

Reflection can be hard to understand but it's doing exactly what the language does, albeit through reflection Types and Values that can disguise what's going on. Just keep in mind that reflection Values need the address of something in order to modify what they represent.

Structs

In our previous example v wasn't a pointer itself, it was just derived from one. A common way for this situation to arise is when using reflection to modify the fields of a structure. As long as we have the address of the structure, we can modify its fields.

Here's a simple example that analyzes a struct value, t. We create the reflection object with the address of the struct because we'll want to modify it later. Then we set typeOfT to its type and iterate over the fields using straightforward method calls (see package reflect for details). Note that we extract the names of the fields from the struct type, but the fields themselves are regular reflect.Valueobjects.

    type T struct {
A int
B string
}
t := T{23, "skidoo"}
s := reflect.ValueOf(&t).Elem()
typeOfT := s.Type()
for i := 0; i < s.NumField(); i++ {
f := s.Field(i)
fmt.Printf("%d: %s %s = %v\n", i,
typeOfT.Field(i).Name, f.Type(), f.Interface())
}

The output of this program is


0: A int = 23
1: B string = skidoo

There's one more point about settability introduced in passing here: the field names of T are upper case (exported) because only exported fields of a struct are settable.

Because s contains a settable reflection object, we can modify the fields of the structure.

    s.Field(0).SetInt(77)
s.Field(1).SetString("Sunset Strip")
fmt.Println("t is now", t)

And here's the result:


t is now {77 Sunset Strip}

If we modified the program so that s was created from t, not &t, the calls to SetInt and SetString would fail as the fields of t would not be settable.

Conclusion

Here again are the laws of reflection:

  1. Reflection goes from interface value to reflection object.
  2. Reflection goes from reflection object to interface value.
  3. To modify a reflection object, the value must be settable.

Once you understand these laws reflection in Go becomes much easier to use, although it remains subtle. It's a powerful tool that should be used with care and avoided unless strictly necessary.

There's plenty more to reflection that we haven't covered — sending and receiving on channels, allocating memory, using slices and maps, calling methods and functions — but this post is long enough. We'll cover some of those topics in a later article.

by Andrew Gerrand (noreply@blogger.com) at March 27, 2012 07:27 AM

Gobs of data

To transmit a data structure across a network or to store it in a file, it mustbe encoded and then decoded again. There are many encodings available, ofcourse: JSON,XML, Google'sprotocol buffers, and more.And now there's another, provided by Go's gobpackage.

Why define a new encoding? It's a lot of work and redundant at that. Why notjust use one of the existing formats? Well, for one thing, we do! Go haspackages supporting all the encodings just mentioned (theprotocol buffer package is ina separate repository but it's one of the most frequently downloaded). And formany purposes, including communicating with tools and systems written in otherlanguages, they're the right choice.

But for a Go-specific environment, such as communicating between two serverswritten in Go, there's an opportunity to build something much easier to use andpossibly more efficient.

Gobs work with the language in a way that an externally-defined,language-independent encoding cannot. At the same time, there are lessons to belearned from the existing systems.

Goals

The gob package was designed with a number of goals in mind.

First, and most obvious, it had to be very easy to use. First, because Go hasreflection, there is no need for a separate interface definition language or"protocol compiler". The data structure itself is all the package should needto figure out how to encode and decode it. On the other hand, this approachmeans that gobs will never work as well with other languages, but that's OK:gobs are unashamedly Go-centric.

Efficiency is also important. Textual representations, exemplified by XML andJSON, are too slow to put at the center of an efficient communications network.A binary encoding is necessary.

Gob streams must be self-describing. Each gob stream, read from the beginning,contains sufficient information that the entire stream can be parsed by anagent that knows nothing a priori about its contents. This property means thatyou will always be able to decode a gob stream stored in a file, even longafter you've forgotten what data it represents.

There were also some things to learn from our experiences with Google protocolbuffers.

Protocol buffer misfeatures

Protocol buffers had a major effect on the design of gobs, but have threefeatures that were deliberately avoided. (Leaving aside the property thatprotocol buffers aren't self-describing: if you don't know the data definitionused to encode a protocol buffer, you might not be able to parse it.)

First, protocol buffers only work on the data type we call a struct in Go. Youcan't encode an integer or array at the top level, only a struct with fieldsinside it. That seems a pointless restriction, at least in Go. If all you wantto send is an array of integers, why should you have to put it into astruct first?

Next, a protocol buffer definition may specify that fields T.x andT.y are required to be present whenever a value of typeT is encoded or decoded. Although such required fields may seemlike a good idea, they are costly to implement because the codec must maintain aseparate data structure while encoding and decoding, to be able to report whenrequired fields are missing. They're also a maintenance problem. Over time, onemay want to modify the data definition to remove a required field, but that maycause existing clients of the data to crash. It's better not to have them in theencoding at all. (Protocol buffers also have optional fields. But if we don'thave required fields, all fields are optional and that's that. There will bemore to say about optional fields a little later.)

The third protocol buffer misfeature is default values. If a protocol bufferomits the value for a "defaulted" field, then the decoded structure behaves asif the field were set to that value. This idea works nicely when you havegetter and setter methods to control access to the field, but is harder tohandle cleanly when the container is just a plain idiomatic struct. Requiredfields are also tricky to implement: where does one define the default values,what types do they have (is text UTF-8? uninterpreted bytes? how many bits in afloat?) and despite the apparent simplicity, there were a number ofcomplications in their design and implementation for protocol buffers. Wedecided to leave them out of gobs and fall back to Go's trivial but effectivedefaulting rule: unless you set something otherwise, it has the "zero value"for that type - and it doesn't need to be transmitted.

So gobs end up looking like a sort of generalized, simplified protocol buffer.How do they work?

Values

The encoded gob data isn't about int8s and uint16s.Instead, somewhat analogous to constants in Go, its integer values are abstract,sizeless numbers, either signed or unsigned. When you encode anint8, its value is transmitted as an unsized, variable-lengthinteger. When you encode an int64, its value is also transmitted asan unsized, variable-length integer. (Signed and unsigned are treateddistinctly, but the same unsized-ness applies to unsigned values too.) If bothhave the value 7, the bits sent on the wire will be identical. When the receiverdecodes that value, it puts it into the receiver's variable, which may be ofarbitrary integer type. Thus an encoder may send a 7 that came from anint8, but the receiver may store it in an int64. Thisis fine: the value is an integer and as a long as it fits, everything works. (Ifit doesn't fit, an error results.) This decoupling from the size of the variablegives some flexibility to the encoding: we can expand the type of the integervariable as the software evolves, but still be able to decode old data.

This flexibility also applies to pointers. Before transmission, all pointers areflattened. Values of type int8, *int8,**int8, ****int8, etc. are all transmitted as aninteger value, which may then be stored in int of any size, or*int, or ******int, etc. Again, this allows forflexibility.

Flexibility also happens because, when decoding a struct, only those fieldsthat are sent by the encoder are stored in the destination. Given the value

type T struct{ X, Y, Z int } // Only exported fields are encoded and decoded.
var t = T{X: 7, Y: 0, Z: 8}

the encoding of t sends only the 7 and 8. Because it's zero, thevalue of Y isn't even sent; there's no need to send a zero value.

The receiver could instead decode the value into this structure:

type U struct{ X, Y *int8 } // Note: pointers to int8s
var u U

and acquire a value of u with only X set (to theaddress of an int8 variable set to 7); the Z field isignored - where would you put it? When decoding structs, fields are matched byname and compatible type, and only fields that exist in both are affected. Thissimple approach finesses the "optional field" problem: as the typeT evolves by adding fields, out of date receivers will stillfunction with the part of the type they recognize. Thus gobs provide theimportant result of optional fields - extensibility - without any additionalmechanism or notation.

From integers we can build all the other types: bytes, strings, arrays, slices,maps, even floats. Floating-point values are represented by their IEEE 754floating-point bit pattern, stored as an integer, which works fine as long asyou know their type, which we always do. By the way, that integer is sent inbyte-reversed order because common values of floating-point numbers, such assmall integers, have a lot of zeros at the low end that we can avoidtransmitting.

One nice feature of gobs that Go makes possible is that they allow you to defineyour own encoding by having your type satisfy theGobEncoder andGobDecoder interfaces, in a manneranalogous to the JSON package'sMarshaler andUnmarshaler and also to theStringer interface frompackage fmt. This facility makes it possible torepresent special features, enforce constraints, or hide secrets when youtransmit data. See the documentation fordetails.

Types on the wire

The first time you send a given type, the gob package includes in the datastream a description of that type. In fact, what happens is that the encoder isused to encode, in the standard gob encoding format, an internal struct thatdescribes the type and gives it a unique number. (Basic types, plus the layoutof the type description structure, are predefined by the software forbootstrapping.) After the type is described, it can be referenced by its typenumber.

Thus when we send our first type T, the gob encoder sends adescription of T and tags it with a type number, say 127. Allvalues, including the first, are then prefixed by that number, so a stream ofT values looks like:


("define type id" 127, definition of type T)(127, T value)(127, T value), ...

These type numbers make it possible to describe recursive types and send valuesof those types. Thus gobs can encode types such as trees:

type Node struct {
Value int
Left, Right *Node
}

(It's an exercise for the reader to discover how the zero-defaulting rule makesthis work, even though gobs don't represent pointers.)

With the type information, a gob stream is fully self-describing except for theset of bootstrap types, which is a well-defined starting point.

Compiling a machine

The first time you encode a value of a given type, the gob package builds alittle interpreted machine specific to that data type. It uses reflection onthe type to construct that machine, but once the machine is built it does notdepend on reflection. The machine uses package unsafe and some trickery toconvert the data into the encoded bytes at high speed. It could use reflectionand avoid unsafe, but would be significantly slower. (A similar high-speedapproach is taken by the protocol buffer support for Go, whose design wasinfluenced by the implementation of gobs.) Subsequent values of the same typeuse the already-compiled machine, so they can be encoded right away.

Decoding is similar but harder. When you decode a value, the gob package holdsa byte slice representing a value of a given encoder-defined type to decode,plus a Go value into which to decode it. The gob package builds a machine forthat pair: the gob type sent on the wire crossed with the Go type provided fordecoding. Once that decoding machine is built, though, it's again areflectionless engine that uses unsafe methods to get maximum speed.

Use

There's a lot going on under the hood, but the result is an efficient,easy-to-use encoding system for transmitting data. Here's a complete exampleshowing differing encoded and decoded types. Note how easy it is to send andreceive values; all you need to do is present values and variables to thegob package and it does all the work.

package main

import (
"bytes"
"encoding/gob"
"fmt"
"log"
)

type P struct {
X, Y, Z int
Name string
}

type Q struct {
X, Y *int32
Name string
}

func main() {
// Initialize the encoder and decoder. Normally enc and dec would be
// bound to network connections and the encoder and decoder would
// run in different processes.
var network bytes.Buffer // Stand-in for a network connection
enc := gob.NewEncoder(&network) // Will write to network.
dec := gob.NewDecoder(&network) // Will read from network.
// Encode (send) the value.
err := enc.Encode(P{3, 4, 5, "Pythagoras"})
if err != nil {
log.Fatal("encode error:", err)
}
// Decode (receive) the value.
var q Q
err = dec.Decode(&q)
if err != nil {
log.Fatal("decode error:", err)
}
fmt.Printf("%q: {%d,%d}\n", q.Name, *q.X, *q.Y)
}

You can compile and run this example code in theGo Playground.

The rpc package builds on gobs to turn thisencode/decode automation into transport for method calls across the network.That's a subject for another article.

Details

The gob package documentation, especially thefile doc.go, expands on many of thedetails described here and includes a full worked example showing how theencoding represents data. If you are interested in the innards of the gobimplementation, that's a good place to start.

by Andrew Gerrand (noreply@blogger.com) at March 27, 2012 07:25 AM

March 26, 2012

Going Along

自由的翼,鹏翔

有感MikeSpook的小诗,作新体诗一首:

自由的翼
——和http://www.mikespook.com/2012/03/73字小诗

桌面
照片里的苍鹰
在飞越高山
可是它却
无法
飞过
压住它的玻璃板

尽管
不一样的世界
就在
无形的墙外面
但不知
天的另一边
是否有
可以停留的驿站
不知
自己的下一步
是不是
就到了终点

如果
真是自由的翼
就不会在乎
在冲撞中折断


附,发到golang-china里的旧体诗:

鹏翔

孤鹏展翅勿彷徨,
冬去春回枝满霜。
负笈含辛十数载,
墙外开花墙内香。

by Fango (noreply@blogger.com) at March 26, 2012 03:49 AM

March 18, 2012

embrace change

Go talk at GTUG Bremen

The next regulars' table of the GTUG Bremen will be on April, 2nd. This time I'll give a talk on Googles programming language Go. Topics will be

  • a short introduction into the history,
  • the basic features,
  • the special combination of elegance and power, and
  • Go in the wilderness.
The talk will be in English and the slides will be available afterwords via SlideShare.

by Frank Müller (noreply@blogger.com) at March 18, 2012 08:27 PM

March 17, 2012

Adam Langley

Very large RSA public exponents

After yesterday's post that advocated using RSA public exponents of 3 or 216+1 in DNSSEC for performance, Dan Kaminsky asked me whether there was a potential DoS vector by using really big public exponents.

Recall that the RSA signature verification core is me mod n. By making e and n larger, we can make the operation slower. But there are limits, at least in OpenSSL:

/* for large moduli, enforce exponent limit */
if (BN_num_bits(rsa->n) > OPENSSL_RSA_SMALL_MODULUS_BITS) {
        if (BN_num_bits(rsa->e) > OPENSSL_RSA_MAX_PUBEXP_BITS) {
                RSAerr(RSA_F_RSA_EAY_PUBLIC_ENCRYPT, RSA_R_BAD_E_VALUE);
                return -1;
        }
}

So, if n is large, we enforce a limit on e. The values of the #defines are such that for n>3072 bits, e must be less than or equal to 64 bits. So the slowest operations happen with an n and e of 3072 bits. (The fact that e<n is enforced earlier in the code.)

So I setup *.bige.imperialviolet.org. This is a perfectly valid and well signed zone which happens to use a 3072-bit key with a 3072-bit public exponent. (I could probably have slowed things down more by picking a public exponent with lots of 1s in its binary representation, but it's just a random number in this case.) One can resolve records 1.bige.imperialviolet.org, 2.bige.imperialviolet.org, … and the server doesn't have to sign anything because it's a wildcard: a single signature covers all of the names. However, the resolver validates the signature every time.

On my 2.66GHz, Core 2 laptop, 15 requests per second causes unbound to take 95% of a core. A couple hundred queries per second would probably put most DNSSEC resolvers in serious trouble.

So I'd recommend limiting the public exponent size in DNSSEC to 216+1, except that people are already mistakenly using 232+1, so I guess that needs to be the limit. The DNSSEC spec limits the modulus size to 4096-bits, and 4096-bit signatures are about 13 times slower to verify than the typical 1024-bit signatures used in DNSSEC. But that's a lot less of a DoS vector than bige.imperialviolet.org, which is 2230 times slower than normal.

March 17, 2012 07:00 AM

March 16, 2012

Adam Langley

RSA public exponent size

RSA public operations are much faster than private operations. Thirty-two times faster for a 2048-bit key on my machine. But the two operations are basically the same: take the message, raise it to the power of the public or private exponent and reduce modulo the key's modulus.

What makes the public operations so much faster is that the public exponent is typically tiny, while the private exponent should be the same size as the modulus. One can actually use a public exponent of three in RSA, and clearly cubing a number should be faster than raising it to the power of a 2048-bit number.

In 2006, Daniel Bleichenbacher (a fellow Googler) gave a talk at the CRYPTO 2006 rump session where he outlined a bug in several, very common RSA signature verification implementations at the time. The bug wasn't nearly as bad as it could have been because it only affected public keys that used a public exponent of three and most keys used a larger exponent: 216+1. But the fact that a slightly larger public exponent saved these buggy verifiers cemented 216+1 as sensible default value for the public exponent.

But there's absolutely no reason to think that a public exponent larger than 216+1 does any good. I even checked that with Bleichenbacher before writing this. Three should be fine, 216+1 saved some buggy software a couple of times and any larger is probably a mistake.

Because of that, when writing the Go RSA package, I didn't bother supporting public exponents larger than 231-1. But then several people reported that they couldn't verify some DNSSEC signatures. Sure enough, the DNSKEY records for .cz and .us are using a public exponent of 232+1.

DNSSEC is absolutely the last place to use large, public exponents. Busy DNS resolvers have to resolve tens or hundreds of thousands of records a second; fast RSA signature verification is big deal in DNSSEC. So I measured the speed impact of various public exponent sizes with OpenSSL (1.0.1-beta3):

Public exponentVerification time (µs)
310.6
216+123.9
232+142.7
2127-1160.7

So a public exponent of 232+1 makes signature verification over four times slower than an exponent of three. As DNSSEC grows, and DNSSEC resolvers too, that extra CPU time is going to be a big deal.

It looks like the zones using a value of 232+1 are just passing the -e flag to BIND's dnssec-keygen utility. There's some suggestion that -e used to select 216+1 but silently changed semantics in some release.

So today's lesson is don't pass -e to dnssec-keygen! The default of dnssec-keygen is 216+1 and that's certainly safe. The .com zone uses a value of three and I think that's probably the best choice given the CPU cost and the fact that the original Bleichenbacher bug has been long since fixed.

March 16, 2012 07:00 AM

March 05, 2012

embrace change

Added commit log

For those of your who are using my Tideland Common Go Library I've added a commit log to this blog. So you can easily see how it evolves. Currently I'm working on a refactoring of the cells package for event-driven applications. The handling has been too complex and uncomfortable and I also want to provide more default cell behaviors for an easy start.

by Frank Müller (noreply@blogger.com) at March 05, 2012 09:34 AM

March 04, 2012

Miek Gieben

Super-short guide to getting q (Part II)

The development of the language Go is going at a fast pace, hence an updated version of Super-short guide to gettinq q.

Get the latest version (called weekly) of Go:

  1. Get Go: hg clone -u release https://go.googlecode.com/hg/ go Note the directory you have downloaded it to and set add its bin directory to your PATH: PATH=$PWD/go/bin.

  2. Update Go to the latest weekly: cd go; hg pull; hg update weekly

  3. Compile Go: cd src, you should now sit in go/src. And compile: ./all.bash

Install missing commands (gcc, sed, bison, etc.) if needed.

The latest Go is now installed. You should now have the go-tool, this is the central interface to all Go program building tasks.

$ go
Go is a tool for managing Go source code.

Usage: go command [arguments]

The commands are:

build       compile packages and dependencies
clean       remove object files
doc         run godoc on package sources
fix         run go tool fix on packages
....
....
lost more

If you can not run go, check your PATH.

Install Go DNS and set GOPATH

The GOPATH variable specifies (among things) where your Go code lives. Using the go tool does bring a few requirements to the table in how to layout the directory structure.

  1. Create toplevel directory (~/g) for your code: mkdir -p ~/g/src
  2. Set GOPATH to this toplevel directory: export GOPATH=~/g
  3. Get dns: cd ~/g/src; git clone git://github.com/miekg/dns.git
  4. Compile it: cd dns; go build
  5. Compile and install the examples, there is a helper Makefile here, but it just calls go multiple times: cd ex; make
  6. Look in $GOPATH/bin for the binaries, in this setup that will be ~/g/bin
  7. Query with q: ~/g/bin/q mx miek.nl (or add ~/g/bin to your $PATH too)
  8. Report bugs

by Miek Gieben at March 04, 2012 08:54 AM

March 02, 2012

Adam Langley

Forward secrecy for IE and Safari too

When we announced forward secrecy for Google HTTPS we noted that “we hope to support IE in the future”. It wasn't that we didn't want to support IE, but IE doesn't implement the combination of ECDHE with RC4. IE (or rather, SChannel) does implement ECDHE with AES on Vista and Windows 7, but I only wanted to make one change at a time.

With the release of MS12-006 a month ago to address the BEAST weakness in TLS 1.0's CBC design, we've now made ECDHE-RSA-AES128-SHA our second preference cipher. That means that Chrome and Firefox will still use ECDHE-RSA-RC4-SHA, but IE on Vista and Windows 7 will get ECDHE now. This change also means that we support ECDHE with Safari (at least on Lion, where I tried it.)

March 02, 2012 08:00 AM

February 29, 2012

Airs – Ian Lance Taylor

Piece of PIE

Modern ELF systems can randomize the address at which shared libraries are loaded. This is generally referred to as Address Space Layout Randomization, or ASLR. Shared libraries are always position independent, which means that they can be loaded at any address. Randomizing the load address makes it slightly harder for attackers of a running program to exploit buffer overflows or similar problems, because they have no fixed addresses that they can rely on. ASLR is part of defense in depth: it does not by itself prevent any attacks, but it makes it slightly more difficult for attackers to exploit certain kinds of programming errors in a useful way beyond simply crashing the program.

Although it is straightforward to randomize the load address of a shared library, an ELF executable is normally linked to run at a fixed address that can not be changed. This means that attackers have a set of fixed addresses they can rely on. Permitting the kernel to randomize the address of the executable itself is done by generating a Position Independent Executable, or PIE.

It turns out to be quite simple to create a PIE: a PIE is simply an executable shared library. To make a shared library executable you just need to give it a PT_INTERP segment and appropriate startup code. The startup code can be the same as the usual executable startup code, though of course it must be compiled to be position independent.

When compiling code to go into a shared library, you use the -fpic option. When compiling code to go into a PIE, you use the -fpie option. Since a PIE is just a shared library, these options are almost exactly the same. The only difference is that since -fpie implies that you are building the main executable, there is no need to support symbol interposition for defined symbols. In a shared library, if function f1 calls f2, and f2 is globally visible, the code has to consider the possibility that f2 will be interposed. Thus, the call must go through the PLT. In a PIE, f2 can not be interposed, so the call may be made directly, though of course still in a position independent manner. Similarly, if the processor can do PC-relative loads and stores, all global variables can be accessed directly rather than going through the GOT.

Other than that ability to avoid the PLT and GOT in some cases, a PIE is really just a shared library. The dynamic linker will ask the kernel to map it at a random address and will then relocate it as usual.

This does imply that a PIE must be dynamically linked, in the sense of using the dynamic linker. Since the dynamic linker and the C library are closely intertwined, linking the PIE statically with the C library is unlikely to work in general. It is possible to design a statically linked PIE, in which the program relocates itself at startup time. The dynamic linker itself does this. However, there is no general mechanism for this at present.


by Ian Lance Taylor at February 29, 2012 03:33 PM

embrace change

Go usage at Google

Go is an interesting language moving towards its first release Go 1. And it's really exciting to see how it gets better each weekly release. But so far there are only few projects known where Go is used for real productive software (here is a first list containing also my employer Canonical). And it would be helpful to know that Google itself trusts in Go for their systems.

One first step has been the adding of Go to the Google App Engine, another one the usage of Go for the code search. Sadly that service has been shut down. But now they presented vitess, a project for the scaling of MySQL. And it's no theoretical project, it's used at YouTube.

There are also two interesting quotes by the developers why they've chosen Go:
"Go is miles ahead of C++ and Java in terms of expressibility and close in terms of performance. It is also relatively simple and has a straightforward interaction with Linux system calls."
It's the same experience I made using Go in professional and personal projects. But the next quote shows the importance of the improvement of the garbage collector:
"The main drawback is also its strength - the garbage collector. vtocc has made spot optimizations to minimize most of the adverse effects of Go’s stop-the-world gc. At this point, we are trading some amount of performance for greater creativity and efficiency at lower layers. unless you’re trying to max out on qps for your servers, you should see acceptable performance from vtocc. Also, go’s garbage collector is being improved. So, this should only get better over time. Go’s existing mark-and-sweep garbage collector is sub-optimal for systems that use large amounts of static memory (like caches). In the case of vtocc, this would be the row cache. To alleviate this, we intend to use memcache for the time being. If the gc ends up addressing this, it should be fairly trivial to switch to an in-memory row cache. Note that the row cache functionality is not fully ready yet."
Here I'm really looking forward to the next releases.

by Frank Müller (noreply@blogger.com) at February 29, 2012 09:59 AM

Go environment setup


A good start for Go development is a root directory for all further directories like /home/themue/projects. It will be the home for the Go SDK, 3rd party packages and own projects. The current weekly will be installed with the command hg clone -u weekly https://code.google.com/p/go in the directory go. http://weekly.golang.org/doc/install.html documents all needed steps in detail. The two other directories have to be created by hand, together with a src and a pkg subdirectory each. This way you get the structure
/home/themue/projects
        /go
                /bin
                ...
                /test
        /own
                /pkg
                /src
        /3rdparty
                /pkg
                /src
Why this 3rd party directory? The new Go tools have a very intelligent way to install external packages in a different directory than own packages. It's based on the environment variable $GOPATH. The naming already shows that it's a path, not only one directory. $GOPATH is the path where commands like go build and go install look for referenced packages they need to compile and link the software. To do so external packages have to be installed before. The command go get will install the sources and the compiled package into the first directory of the path. So the order in the case above should be
GOPATH=$HOME/projects/3rdparty:$HOME/projects/own
When installing own packages which source is located in .../own/src it will be installed into .../own/pkg.

Which environment variables else are needed? One is $GOROOT. It has to point to the root of the Go SDK, here /home/themue/projects/go. The compiled binaries will by default be installed in $GOROOT/bin. But there are two further options for the work with the SDK:

  • The location for the binaries is ok. Here the standard $PATH should point to this directory too with PATH=$PATH:$HOME/projects/go/bin.
  • The other way is that you've got a directory like $HOME/bin for all your private binaries and scripts and it's already in your $PATH. In this case you can set GOBIN=$HOME/bin before installing the Go SDK. In this case after the compilation the Go binaries get installed in your $HOME/bin.

Inside your own source directory you may need further subdirectories for your different projects. Their naming depends on possible external SCM environments you're using. Mine is code.google.com. So for the Tideland Common Go Library (see http://code.google.com/p/tcgl) I placed the source in
.../own/src/code.google.com/p/tcglThis project contains several packages which I now can import as code.google.com/p/tcgl/foo, regardless if I'm inside the same project or another one. And more important is that people who are taking a look into my code can easily see how to get and import it as their 3rd party code.

by Frank Müller (noreply@blogger.com) at February 29, 2012 09:01 AM

February 28, 2012

embrace change

Convinced by Sublime Text

During my now almost 30 years of software development I've used many editors (as well as IDEs). And you know, once you found your favorite one it's hard to change. There are holy wars on the net between the different fan groups. And everyone surely knows enough advantages for his acme, emacs, vim or any other one. After SPF/2 and SNiFF++ my favorite for about 17 years has been gvim. Straight, powerful, extendable and multi-platform. Also when working on remote machines via ssh the vim is a great partner. I've also tried emacs for some time and have been impressed by its power. But I stuck too deep into vim, so I never switched.

Now the situation changed. It started with the announcement of Sublime Text 2 Build 2165. Based on the good user resonance I gave it a try. It's available for my two operating systems Linux and OS X as well as for Windows, that's an important fact. I've been impressed immediately. The usage is exactly like expected, the feature list is long, all my languages a supported very well and there are many ways to extend and customized the system.


What I like most are the high level view on the content and the very good code completion. The later not only narrows the choices based on your written text from left to right. It analyzes all tokens the way the panicFailFunc() mentioned in the screenshot above may be found by entering pff or even pfc. Together with the language depending intelligent code snippets which are part of the code completion too it makes you really fast.

So for my Go code I'm writing tf which almost always leads directly to the tfunc snippet. Now a tab and the body of a type based function (aka method) is printed. The cursor is right in the type brackets. I'm typing m *mft and tab again which is expanded to m *MyFantasticType. Now the cursor is right where I've got to add the function name, after that where to put the argument, then the result and then right indented inside the function body. And here I can use other snippets as well as the intelligent code completion again. That's fun.

Also the navigation in the code is great. Cursor movement like expected doesn't wonder. But jumping word by word in an identifier like myLongFunctionNameExample() is great. Or jumping to a line (nothing new), jumping to a symbol (with the same way of code recognition like when coding) or just to any file in the current project (yes, it also has a project management). Also the highlighting of other occurrences of the selected text, multi-select, the Vintage mode for vim fans, powerful find and replace, and, and, and ...

But what do you do in case of missing parts? Here Sublime Text 2 is very configurable, so that the integration of new tools and programming languages is simple. Additionally you can record macros or extend the system using Python. Additionally there are many packages, most of them can be installed easily by Sublime Package Control. I used it so far to install SublimeHg for an easy access to Mercurial and SublimeREPL for different REPL environments - in my case LISP and the shell.

I haven't yet discovered everything, but I've been able to use it fluently with a remarkable speed-up from day 1. So I switched and use it now on Linux and Mac.

by Frank Müller (noreply@blogger.com) at February 28, 2012 11:18 AM

February 27, 2012

embrace change

New article about Redis

Tomorrow it will be available at the shops, I've got my copy already today like the other subscribers too: the new (German) iX magazine 3/2012. This time it contains my newest article about the great NoSQL database Redis. Its fantastic mix of performance and powerful data types and commands makes it my choice for my current Tideland project.

For those who need a Go client for Redis take a look at http://code.google.com/p/tcgl/.

by Frank Müller (noreply@blogger.com) at February 27, 2012 03:19 PM

February 26, 2012

embrace change

Restart of this blog

So, re-started this blog. It is now hosted at blogger.com and focussed on software development. Privat topics are handled at frank.mweb.de. But that blog is in German. The first topics are in the pipe, so stay tuned.

by Frank Müller (noreply@blogger.com) at February 26, 2012 07:41 PM

February 12, 2012

Adam Langley

RSA 2012

Just a brief note that I'll be at RSA 2012 in San Francisco at the end of the month and will be speaking with several others about certificate revocation. (tech-106, Tuesday, 1pm.).

If you want to chat and don't manage to grab me then, drop me an email. (agl at imperialviolet.org if you didn't already know.)

February 12, 2012 08:00 AM

February 11, 2012

Miek Gieben

godns is now dns

Due to the new go tool (long story), I've renamed godns to dns. This means the github repo is also somewhere else.

godns installed itself as dns so code using it does not need to be changed.

by Miek Gieben at February 11, 2012 09:09 PM

February 09, 2012

Going Along

Google App Engine with Go [1]

Google 的 App Engine 简称 GAE, 那我们的就叫 GAE Go, 读作"给够".

众所周知, GAE 就是谷歌"云运算"的那个"云". 为什么叫"云"? 传统的说法是互联网的模型画在纸上画不出来, 就随便圈几笔代表抽象的远端共享的意思, 据说又是继承了20世纪初老平电话系统的伟大光荣正确的优良传统. 而另一个貌似科学的说法, 是说云就是天上的水啊, 是湖里海里的水蒸腾上去, 再化成甘露滋润大地. 这番云雨过程, 是我们这些刚刚开始摆脱稍微软些的硬体束缚的 flower children 所热切渴望的. 我之所以要云雨呀的, 是想让程序员告诉另一半, 我们所从的业, 不总是干巴巴的, 有时也可以很湿润.

言归正传. GAE 开发者把 python 或 Java 的代码传上去, 申请一个 URL 指向它, 就可以用浏览器运行那些代码提供的服务了. 现在, 或说, 马上, 我们也可以在云端运行 Go 的云代码了.

开始写云代码的第一步, 当然是下载一个软件开发包. 没有 Linux 或 OSX 的窗口们, 还要再等等, 或者如前篇所示, 先 coLinux 着.

展开压缩包, 用 console 进入目录, 遵照传统仪式执行我们的第一个打招呼程式.

./dev_appserver.py demos/go-helloworld

GAE一直坚持使用Python 2.5, 所以可能会有个提醒说你的Python太新了会出问题. 现在还不会, 所以, 如果出了错, 可能是你拼写错了, 找不到 go-helloworld.

如果一切 ok, 它会说到 http://localhost:8080 看结果. 那我们就要用浏览器了. 记住这是云运算, 接收雨露要打开世界之窗, 把自己闭锁在再伟大的长墙里, 也造不出好车, 只能自己慢慢臭掉, 要死了也没人知道.

作为程序员, 我们不能只满足弄出个活就完了. 我们要有强烈的求知欲和审美观, 要揭开表面看内涵, 要学着鉴别那些僵化的, 冗长的, 低效的, 缺乏监督的东西, 是用再多钱也美化不了的丑, 再强权也遮掩不住的陋. 同时, 要欣赏那种灵活的, 简洁的, 能极速运行又有 compiler 滴水不漏纪律检查的体制设计, 所能带给我们的实实在在的安全和舒适.

又是口水一滩. 再次回到正题, 我们逐行审视这个招呼程式:

package counter            

应该是个笔误, 叫 package helloworld 比较准确, 但其实只要不叫 main, 又没和其他用到的 package 重名, 叫什么都无所谓. main 比较特殊, 它是每个 Go 程式的入口, 此处由 GAE 提供. 我们程式的入口, 是 init.

import(
    "fmt"
    "http"
)

引入 package fmt 和 http 这两个 Go 的标准包, 以便使用其提供的 format 和 http 功能.

func init(){
   http.HandleFunc("/", handle)
}

我们程序的入口, 注册 URL 根路径的访问由 handle 处理.

func handle(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "text/plain; charset=utf-8")
    fmt.Fprint(w, "Hello, World!\n")
}

我们的 handle 中, 浏览器的访问请求由 r 指向, 回应写入 w 缓冲区, 在 handle 返回时, 交由 GAE 传送给浏览器.

为了让 GAE 能正确找到我们的程序, 在 go-helloworld 目录下要准备一个叫 app.yaml 的配置文件, 主要内容为:

runtime: go

指定 GAE 运行的不是什么 Python 或 Java 模块, 而是我们亲爱的 Go.

handlers:
- url: /.*
  script: _go_app

告诉 GAE 把 / 下所有路径都交由 _go_app 处理. 小心不要拼错, 否则会莫明其妙的一堆 Python 错误, 害我浪费3个小时, 试图搞明白 Python 到底想告诉我什么. 其实也搞不明白, 因为它压根儿没提是我用 vi 的时候, 不小心在 _go_app 后面, 留下一个 d.

这个 _go_app 出现在 google/appengine/ext/go/ 的 __init__.py 里, 它是架通 _dev_appserver.py 和我们 Go 程序的桥梁.  当浏览器指向我们的 URL 的时候, _dev_appserver 会编译我们的 Go 程序, 并存放在 /tmp 下新创建一个的目录. 不出所料, 文件名就叫 _go_app, 是个货真价实的可执行文件, 而不像 Python 和 Java 那样要委身虚假机. 文件很大, 2.8兆个字节, 所有用到的库和运行环境全部打包独立自主, 不存在外部依赖, SAC? Stand-Alone Complex, 每个 _go_app 都是孤胆特种兵.

/tmp 下还有两个文件用来通信, 有兴趣就自己看看源代码, Open Source 在行政主管头里只是另一个主义, 在我们程序员眼里, 是破除封建迷信, 重拾科学民主精神的身体力行的源动力.

本来要就此收笔, 可 blogger 维修下线了, 就多唠叨两句, 也算是发个呼吁: "咱们搞语言的人, 可别荒废了母语, 大家多认真写写, 即整理思路, 总结经验, 也给别人个参考. 利己利人, 共同进步."

by Fango (noreply@blogger.com) at February 09, 2012 04:26 AM

Google App Engine with Go [2]

上篇我们和"给够"打了招呼. 一回生, 二回熟, 这次我们再看一个 SDK 的 demo, 主要针对对'给够'感兴趣, 还没接触 Go 的, 但有 C/C++, Java/Javascript 等 C 类语言基础的同学, 能对 Go 大体有个了解, 进而激发深入的情趣. 其实, Go 很简单, 简单到一夜 html 就能把最真实的全部, 完整细致的规范出来.

上篇 go-helloworld 的 package 叫 counter, 可能是此处 go-counter 的笔误:

package counter

import (
    "fmt"
    "http"
    "io"
    "os"
    "strconv"

    "appengine"
    "appengine/memcache"
)

我们看到 Go 要求列出所有引用的包, 哪怕, 比如, 如果你查找 os, 只有一处提到 os, 是 os.Error, 是 Go 中习惯使用的错误处理类型. 所谓"习惯", idiom, 在 Go 中有很大的分量. Go 的标准库是 open source 的, 是鼓励大家去学习, 去体会, 去习惯 Go 语言的各种使用方法的. 如果大家都默契的使用同一种约定俗成的 idiomatic use, 便可以更容易的架构和维护很大的项目.


func serveError(c appengine.Context, w http.ResponseWriter, err os.Error) {
    w.WriteHeader(http.StatusInternalServerError)
    w.Header().Set("Content-Type", "text/plain")
    io.WriteString(w, "Internal Server Error")
    c.Logf("%v", err)
}

和 C 类的其它语言不同, Go 的变量类型放在名后, 例如 err 是 os.Error 类型. 目的是使复杂的类型更容易理解. 函数由 func 定义, 此处的 serveError 函数, 使用 w 提供的函数, 和 io 包的函数, 在 http 缓冲区写入标准的出错返回. 同时, c, 作为 appengine 的 Context 类型, 提供一个 Logf 函数, 类似 printf, 打印记录 err. 

w 提供 http 回应缓存界面. 界面 - interface, 是 Go 的一个发明, 规定其类型必须实现此界面定义的那些函数. 例如, ResponseWriter 界面类型, 就定义了 Header, Write 和 WriteHeader 三个函数. 

func handle(w http.ResponseWriter, r *http.Request) {

"给够"的 URL 服务函数, r 指针指向 http 请求. Go 和 C 一样,  使用 & 取内存地址,  使用 * 作为内存地址指针, 但没有指针操作, 即不存在 *p++ 这种用法. 目的是把程序员从内存管理中解放出来, 交由 GC 处理, 同时, 也允许, 并鼓励程序员明确的使用内存, 从而在安全方便和时空效率两方便取得平衡. 多说几句, 在 Go 之前, 我挣扎着用过 Erlang, 其不可思议的 OTP (one time programmable) 变量, 据说从根本上避免 race condition, 但大量 GC 之低效, 大量无中生有的创造新变量名, 加上其古怪的模式匹配和 tail recursion, 让我如同被钓起的鱼, 窒息中体味针扎般死亡的恐惧. 

    c := appengine.NewContext(r)
    n := 0
    item, err := memcache.Get(c, r.URL.Path)

Go 的赋值和变量类型声明可以放在一起, 由 := 完成, 编译器会从赋值推导出对应变量的类型. 例如, c 是 App Engine 的(新的)上下文类型. n 是整数型, 其值为0. item 和 err 同时得到 Get 返回值, 及其类型. 多重赋值也是 Go 的一个新颖之处, 例如 a, b = b, a 可交换 a 和 b 的值.

    if err == memcache.ErrCacheMiss { 
     // Not found, new item  
    } else if err != nil { 
     //  Error
        serveError(c, w, err)
        return
    } else { 
     // Got
     n, err = strconv.Atoi(string(item.Value))
     if err != nil {
     serveError(c, w, err)
     return
     }
    } 

此处我故意对原文做了修改, 以突出三种可能的 err 及其分支处理. 这又是一个习惯, 明确全部逻辑分支, 并尽早出错返回. 
        
    n++
    item = &memcache.Item{
        Key:   r.URL.Path,
        Value: []byte(strconv.Itoa(n)),
    }

Item 是 memcache 包里定义的一个 struct, 除了 Key 和 Value 在此处赋值之外, 还有 Object, Flags 和 Expiration 三个项, 没有在 {} 里赋值, 所以都默认为 0. {} 这种赋值格式称为 literal, & 取地址, 以符合下面 Set 指针 item 的要求.

    err = memcache.Set(c, item)
    if err != nil {
        serveError(c, w, err)
        return
    }

    w.Header().Set("Content-Type", "text/plain")
    fmt.Fprintf(w, "%q has been visited %d times", r.URL.Path, n)
}

如果有错, 交由 serveError 返回标准出错信息. 否则, 如果 Get 没有找到对应的 URL, n 是 0. 然后访问次数加一, Set 回 memchache 的对应 URL 项, 由 http (即 w) 完成显示.

新上手的同学, 可以到 golang.org 查看大量的文档. 中文部分, 可以参考 golang-china.org, 以及笔者翻译的入门指导, 效率手册和部分的语言规范. 还有一个出于好玩写的 Go 编程小说, 可以无聊时读读.  

by Fango (noreply@blogger.com) at February 09, 2012 04:26 AM

Google App Engine with Go [3]

'给够'三部曲: "一顾钟情", "再顾倾城", 现在我们来"三顾", 请高人出世.

我们温习一下 Andrew 和 Rob 在 Google IO 演示的 moustach-io 的 http.go, 以此了解一个貌似实用的 App Engine 程序, 是怎样使用 Go 的高级知识的. 当然, 我们还是用 dev_appserver 来冒充真正的云运算.

import (
    "bytes"
    "fmt"
    "http"
    "image"
    "image/jpeg"
    _ "image/png" // import so we can read PNG files.

此例中的图像压缩使用 JPEG 的函数, 所以引入 image/jpeg 包. image.Decode 图像解压(读取)也可以使用 PNG 格式. 但由于 Go 不允许引入没用到其函数的包, 例如, 此程序没有 png 的函数. 所以, 用 _ 空白标识符来引入 image/png 包, 目的是为了使用其 init 函数注册 decode 用的 PNG 格式.

    "io"
    "json"
    "os"
    "strconv"
    "template"
    "goauth2.googlecode.com/hg/oauth"
)

goauth2 和 freetype (在 draw.go中 ) 是独立于'给够' SDK 外的第三方的库, 参考 README 用 hg clone 安装, 以便能编译到 _go_app 可执行代码中.

import (
    "appengine"
    "appengine/datastore"
    "appengine/urlfetch"
    "crypto/sha1"
    "resize"
)

const (
    CLIENT_ID     = "Your Client ID here."
    CLIENT_SECRET = "Your Client Secret here."
)

var (
    uploadTemplate = template.MustParseFile("upload.html", nil)
    editTemplate   *template.Template // set up in init()
    postTemplate   = template.MustParseFile("post.html", nil)
    errorTemplate  = template.MustParseFile("error.html", nil)
)

常量 const 和变量 var 声明也可以使用类似 import 的格式, 并可多次出现. 
template 是 Go 的一个标准包, 匹配文件或字串中定义的模式变量, 用真正变量的值替换. 

func init() {
    http.HandleFunc("/", errorHandler(upload))
    http.HandleFunc("/edit", errorHandler(edit))
    http.HandleFunc("/img", errorHandler(img))
    http.HandleFunc("/share", errorHandler(share))
    http.HandleFunc("/post", errorHandler(post))
    editTemplate = template.New(nil)
    editTemplate.SetDelims("{{{", "}}}")
    if err := editTemplate.ParseFile("edit.html"); err != nil {
        panic("can't parse edit.html: " + err.String())
    }
}

我们程序的入口, 注册 URL 路径的处理函数, 模式变量的分隔符设定为三个{}, 因为默认的{}和 Javascript 起冲突. errorHandler 很有趣, 我们先来和 check 一起看一下.

func errorHandler(fn http.HandlerFunc) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        defer func() {
            if err, ok := recover().(os.Error); ok {
                w.WriteHeader(http.StatusInternalServerError)
                errorTemplate.Execute(w, err)
            }
        }()
        fn(w, r)
    }
}

// check aborts the current execution if err is non-nil.
func check(err os.Error) {
    if err != nil {
        panic(err)
    }
}

panic, defer 和 recover 是 Go 的崭新的异常处理方式. 可以这样看: 当 check 的 err 非空时, panic, 导致 check 及其调用栈的所有函数 (例如 upload 和 errorHandler) 立即退出. 但 errorHandler 定义了 defer, 会在退出前执行, 而其定义了 recover, 会拿到 panic 的变量, 即 err, 使用 type assertion 肯定它是 os.Error 类型, 并结束 panic 导致的退出, 从而在此处完成 panic 引起的异常, 用 errorTemplate 的模版在浏览器显示出错信息. 如果没有 panic, 或 panic 传入的不是 os.Error 类型, defer 没有动作.

另外, 注意到没有? errorHandler 接受一个函数, 包装 defer 后, 返回一个无名函数. Go 的函数可以像普通变量一样使用, 还可以使用 closure, 等一下会看到.

type Image struct {
    Data []byte
}

定义 Image 结构, 其 Data 项是 byte slice, 字节切片, 能方便安全的完成类似 C 的指针操作, 因为, slice 其实就是个指针指向某块内存, 但同时记录目前的长度 len, 供读写, 还有容量 cap, 供编译器地址越界检查. 

func upload(w http.ResponseWriter, r *http.Request) {
    if r.Method != "POST" {
        // No upload; show the upload form.
        uploadTemplate.Execute(w, nil)
        return
    }

    f, _, err := r.FormFile("image")
    check(err)
    defer f.Close()

defer 是在函数 (upload) 退出时执行的, 紧跟着打开文件写, 以防忘记关闭. 并且每个 err 都用 check 异常处理. 

    var buf bytes.Buffer
    io.Copy(&buf, f)
    i, _, err := image.Decode(&buf)
    check(err)

bytes.Buffer 能自动管理字节内存, io.Copy 可以最高效的从文件 f 拷贝内存到 buf 的地址. image.Decode 能分析图像格式, 并自动用 jpeg 或 png 包的 Decode 函数解压. 

    const max = 1200
    if b := i.Bounds(); b.Dx() > max || b.Dy() > max {

if 语句可以有初始赋值子句. 此处的变量 b 只在 if 后的 {} 块里有效, 即其作用域由 {} 限定.

        if b.Dx() > 2*max || b.Dy() > 2*max {
            w, h := max, max
            if b.Dx() > b.Dy() {
                h = b.Dy() * h / b.Dx()
            } else {
                w = b.Dx() * w / b.Dy()
            }
            i = resize.Resample(i, i.Bounds(), w, h)
            b = i.Bounds()
        }
        w, h := max/2, max/2
        if b.Dx() > b.Dy() {
            h = b.Dy() * h / b.Dx()
        } else {
            w = b.Dx() * w / b.Dy()
        }
        i = resize.Resize(i, i.Bounds(), w, h)
    }

由于变量的局部作用域, 变量名可以很短, 而不用担心会和作用域外的同名变量起冲突.

    // Encode as a new JPEG image.
    buf.Reset()
    err = jpeg.Encode(&buf, i, nil)
    check(err)

    // Create an App Engine context for the client's request.
    c := appengine.NewContext(r)

    // Save the image under a unique key, a hash of the image.
    key := datastore.NewKey("Image", keyOf(buf.Bytes()), 0, nil)
    _, err = datastore.Put(c, key, &Image{buf.Bytes()})
    check(err)

    // Redirect to /edit using the key.
    http.Redirect(w, r, "/edit?id="+key.StringID(), http.StatusFound)
}

用 r 的 FromFile 从浏览器上传的图像, 解压检查, 重新采样改变大小, 再压缩回 JPEG 格式, 放入 App Engine 的 datastore, 再让浏览器重定向到 edit 编辑页面, 添加大胡子. 

func edit(w http.ResponseWriter, r *http.Request) {
    editTemplate.Execute(w, r.FormValue("id"))
}

由 init() 中指定处理 URL /edit 的请求, 简单的执行对应的模版. 

func keyOf(data []byte) string {
    sha := sha1.New()
    sha.Write(data)
    return fmt.Sprintf("%x", string(sha.Sum())[0:8])
}

由 sha1 得到对应数据的唯一指纹, 注意看其 API 是多么的简单, 一写一读即告完成. 

func img(w http.ResponseWriter, r *http.Request) {
    c := appengine.NewContext(r)
    key := datastore.NewKey("Image", r.FormValue("id"), 0, nil)
    im := new(Image)
    err := datastore.Get(c, key, im)
    check(err)

    m, _, err := image.Decode(bytes.NewBuffer(im.Data))
    check(err)

    get := func(n string) int { // helper closure
        i, _ := strconv.Atoi(r.FormValue(n))
        return i
    }
    x, y, s, d := get("x"), get("y"), get("s"), get("d")

这里用到了 closure. get 是只限此处使用的函数, 其内部的 r 从外包的函数传入 (闭包), n 和 i 是正常的参数和返回值. 

    if x > 0 { // only draw if coordinates provided
        m = moustache(m, x, y, s, d)
    }

moustache 在 draw.go 中定义, 也使用 package moustachio, 编译器会把他们一起编译. 

    w.Header().Set("Content-type", "image/jpeg")
    jpeg.Encode(w, m, nil)
}

如此简单的把 jpeg 图像传回浏览器. 

func share(w http.ResponseWriter, r *http.Request) {
    url := config(r.Host).AuthCodeURL(r.URL.RawQuery)
    http.Redirect(w, r, url, http.StatusFound)
}

使用 OAuth 签证用户, 在 dev_appserver 不可用, 和其后的 post 及 postPhoto 一起, 此处就不深入了 . 

这样, 我们分三步学习了 Go 在 Google App Engine 的使用. 读过的会忘记, 写过的会留下些什么, 但只有做过错过改过的才会刻骨铭心. 为了大家不错过这个推广 Go 的好时机, 特作此文, 希望大家多多推介, 让笔者最钟爱的编程语言, 能迅速的普及开. 谢谢.




by Fango (noreply@blogger.com) at February 09, 2012 04:25 AM

February 07, 2012

research!rsc

go blog()

Let my start by apologizing for the noisy duplicate posts. I know that people using RSS software to read this blog got a whole bunch of old posts shown as new yesterday, and the same thing happened again just now. I made some mistakes while moving the blog from one platform to another, which caused the first burp, and then I had to fix the mistakes, which caused the second burp. But it's done, and there won't be another batch of duplicates waiting for you tomorrow.

I've moved this blog off of Blogger onto App Engine, running on a custom app written in Go. The down side is that I had to implement functionality that Blogger used to handle for me, like generating the RSS feed, and that's both extra work and a chance to make mistakes, which I took full advantage of. The up side, however, is that it makes it significantly easier for me to automate the writing and publishing of posts, and to create posts with a computational aspect to the content. I'll be blogging in the coming months about both the new setup, which has some interesting technical aspects behind it (I can edit live posts in a real text editor, for one thing), and about other topics that can make use of the computation.

For now, though, it's just the same content on a new server. Enjoy.

February 07, 2012 05:00 AM

February 05, 2012

Adam Langley

Revocation checking and Chrome's CRL

When a browser connects to an HTTPS site it receives signed certificates which allow it to verify that it's really connecting to the domain that it should be connecting to. In those certificates are pointers to services, run by the Certificate Authorities (CAs) that issued the certificate, that allow the browser to get up-to-date information.

All the major desktop browsers will contact those services to inquire whether the certificate has been revoked. There are two protocols/formats involved: OCSP and CRL, although the differences aren't relevant here. I mention them only so that readers can recognise the terms in other discussions.

The problem with these checks, that we call online revocation checks, is that the browser can't be sure that it can reach the CA's servers. There are lots of cases where it's not possible: captive portals are one. A captive portal frequently requires you to sign in on an HTTPS site, but blocks traffic to all other sites, including the CA's OCSP servers.

If browsers were to insist on talking to the CA before accepting a certificate, all these cases would stop working. There's also the concern that the CA may experience downtime and it's bad engineering practice to build in single points of failure.

Therefore online revocation checks which result in a network error are effectively ignored (this is called “soft-fail”). I've previously documented the resulting behaviour of several browsers.

But an attacker who can intercept HTTPS connections can also make online revocation checks appear to fail and so bypass the revocation checks! In cases where the attacker can only intercept a subset of a victim's traffic (i.e. the SSL traffic but not the revocation checks), the attacker is likely to be a backbone provider capable of DNS or BGP poisoning to block the revocation checks too.

If the attacker is close to the server then online revocation checks can be effective, but an attacker close to the server can get certificates issued from many CAs and deploy different certificates as needed. In short, even revocation checks don't stop this from being a real mess.

So soft-fail revocation checks are like a seat-belt that snaps when you crash. Even though it works 99% of the time, it's worthless because it only works when you don't need it.

While the benefits of online revocation checking are hard to find, the costs are clear: online revocation checks are slow and compromise privacy. The median time for a successful OCSP check is ~300ms and the mean is nearly a second. This delays page loading and discourages sites from using HTTPS. They are also a privacy concern because the CA learns the IP address of users and which sites they're visiting.

On this basis, we're currently planning on disabling online revocation checks in a future version of Chrome. (There is a class of higher-security certificate, called an EV certificate, where we haven't made a decision about what to do yet.)

Pushing a revocation list

Our current method of revoking certificates in response to major incidents is to push a software update. Microsoft, Opera and Firefox also push software updates for serious incidents rather than rely on online revocation checks. But our software updates require that users restart their browser before they take effect, so we would like a lighter weight method of revoking certificates.

So Chrome will start to reuse its existing update mechanism to maintain a list of revoked certificates, as first proposed to the CA/Browser Forum by Chris Bailey and Kirk Hall of AffirmTrust last April. This list can take effect without having to restart the browser.

An attacker can still block updates, but they have to be able to maintain the block constantly, from the time of revocation, to prevent the update. This is much harder than blocking an online revocation check, where the attacker only has to block the checks during the attack.

Since we're pushing a list of revoked certificates anyway, we would like to invite CAs to contribute their revoked certificates (CRLs) to the list. We have to be mindful of size, but the vast majority of revocations happen for purely administrative reasons and can be excluded. So, if we can get the details of the more important revocations, we can improve user security. Our criteria for including revocations are:

  1. The CRL must be crawlable: we must be able to fetch it over HTTP and robots.txt must not exclude GoogleBot.
  2. The CRL must be valid by RFC 5280 and none of the serial numbers may be negative.
  3. CRLs that cover EV certificates are taken in preference, while still considering point (4).
  4. CRLs that include revocation reasons can be filtered to take less space and are preferred.

For the curious, there is a tool for fetching and parsing Chrome's list of revoked certificates at https://github.com/agl/crlset-tools.

February 05, 2012 08:00 AM

February 03, 2012

RSC

_rsc: https://t.co/iRXEwiWm @goroutine @app_engine @go_nuts #golang

_rsc: https://t.co/iRXEwiWm @goroutine @app_engine @go_nuts #golang

February 03, 2012 05:11 PM

January 30, 2012

Adam Langley

Extracting Mozilla's Root Certificates

When people need a list of root certificates, they often turn to Mozilla's. However, Mozilla doesn't produce a nice list of PEM encoded certificates. Rather, they keep them in a form which is convenient for NSS to build from: https://mxr.mozilla.org/mozilla/source/security/nss/lib/ckfw/builtins/certdata.txt?raw=1.

Several people have written quick scripts to try and convert this into PEM format, but they often miss something critical: some certificates are explicitly distrusted. These include the DigiNotar certificates and the misissued COMODO certificates. If you don't parse the trust records from the NSS data file, then you end up trusting these too! There's at least one, major example of this that I know of.

(Even with a correct root file, unless you do hard fail revocation checking you're still vulnerable to the misissued COMODO certificates.)

So, at the prodding of Denton Ge