Wednesday, June 17, 2009

 

I am the alpha dog!

By which I mean, my Android app is now in alpha test: I just uploaded it onto my HTC Magic, and whaddaya know, it runs, it doesn't crash, it actually does stuff. Indeed, I just updated WikiTravel via its interface. (Slightly buggily, but hey, alpha test.)

Mind you, the Android app is a fairly trivial piece of programming next to the AppEngine middleware that does the actual parsing and updating of WikiTravel. Which is the way it should be. Resources are scarce on a smartphone, and both processing and battery power are relatively meagre; I expect the design pattern of choice is going to be "get as much of the work done as you can on your server farm, then communicate simple low-bandwidth stuff to the phone."

Ultimately, the phone isn't much more than a user interface. In fact, now that I think about it, my whole app follows the classic Model-View-Controller design: WikiTravel is the model, the smartphone is the view, and my AppEngine service is the controller. Huh. Plus ça change, plus c'est la même chose.

Anyway, a list of tips, tricks, and annoyances since last we met:

Labels: , , ,


Comments:
python does let you do "this is %s and this is %s" % (anObject, anOther), where it automatically converts anObject and anOther to strings for you.
 
or use:

import pprint

pprint.pprint( whatever data type )
 
Unfortunately, Python's handling of string encodings is the sort of thing that creates mental blocks instead of removing them. The key to understanding Python's mindset, for me, came after frustration over getting a UnicodeDecodeError when calling myString.encode-- decode? encode? what?

So, for what it's worth, the things I needed to get through my head to break the blockage follow. Apologies if it's all "duh"; I had to have this explained to me in very simple words before I realized what I was getting wrong.

1. A 'normal' string ('dog') is a sequence of bytes; a Unicode string (u'dog') is a sequence of Unicode code points, like the letter C (code point U+0043) or 'LATIN SMALL LETTER LZ DIGRAPH' (code point U+02AB).

1a. Out in the world, some people use "Unicode" to mean "a string encoding capable of handling every Unicode character", like "UTF-8". Python never means that. Unicode is a type.

2. You can't exactly print Unicode strings or write them to files. You just call functions that require bytes but will helpfully coerce Unicode to bytestring invisibly. (Which is reasonable on the face of it, just as Python will do the implicit coercion to a float when you ask for "4.5 + 17", but...)

3. "Decode" means "convert string/bytes to Unicode", "encode" means "convert Unicode to string/bytes".

3a. "decode" and "encode" are both the sort of functions that coerce their input when needed. If you say

"dog".encode('utf-8')

Python will see you're calling "encode" on a non-Unicode string, convert it to Unicode, and then encode that as UTF-8.

3b. These implicit conversions always use the default encoding, set in site.py (and readable as sys.getdefaultencoding(), but you can't set it once the interpreter finishes startup). This default is almost always ascii, which means

'Bj\xc3\xb6rk'.decode('utf-8')

will be fine, but

'Bj\xc3\xb6rk'.encode('utf-8')

will throw a UnicodeDecodeError because, implicitly, you've said

'Bj\xc3\xb6rk'.decode(sys.getdefaultencoding()).encode('utf-8')

and the ASCII code table gives no guidance for the middle of that string.

4. This sucks, and is apparently fixed in Python 3.
 

Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]