Monday, November 30, 2009
How to store images larger than 1 megabyte in Google App Engine
Over the summer, Google App Engine raised its limits for web requests and responses from 1MB to 10MB, but kept the maximum size of any single database element at 1MB. If you try to exceed this, you'll get a MemoryError. You can find a fair amount of grief and woe and gnashing of teeth and wearing of sackcloth and ashes about this online.
Which is kind of surprising, because it's not that hard to break files up into chunks and store those chunks in the database separately. Here's what I did today for my current project, which stores data - including photos - uploaded from smartphones:
First, we have to receive the uploaded image. Our uploads are two-phase - first data, then a photo - for various reasons. The data upload includes the image's file name; the photo upload is a basic form/multipart POST with exactly one argument (the filename) and its value (the file).
So, in "main.py":
and in "ec.py":
Pretty basic stuff: we chop the image up at each 1,000,000-byte mark, and put each chunk into its own ImageChunk DB object.
Then, when we need to retrieve the image, in 'main.py':
and in 'ec.py':
Since db.Blob is a subtype of str, that's all you have to do. I don't understand why some people are so upset about this: it's mildly annoying that I had to write the above, but hardly crippling. At least with JPEGs, which is what we use. (But I don't see why any other file type would be more difficult; they're ultimately all just a bunch of bytes). Could hardly be easier ... well, until App Engine rolls out their large file service.
(eta, Dec 14: which came out today! Meaning you can now disregard all the above and just use the new Blobstore instead.)
(eta, Dec 16: mmm, maybe not. Looked at the Blobstore in detail today, and it's really best suited for browser projects, not app or web-service stuff. The API for the blobs is very limited, and you can only access them via one-time-only URLs that App Engine puts in your HTML. You could scrape that, granted, but that's a pain in the ass, no less inelegant than the image-chunking solution above. It's experimental and subject to change, too. I think I'll hold out until its API improves.)
Which is kind of surprising, because it's not that hard to break files up into chunks and store those chunks in the database separately. Here's what I did today for my current project, which stores data - including photos - uploaded from smartphones:
First, we have to receive the uploaded image. Our uploads are two-phase - first data, then a photo - for various reasons. The data upload includes the image's file name; the photo upload is a basic form/multipart POST with exactly one argument (the filename) and its value (the file).
So, in "main.py":
class SaveImage(webapp.RequestHandler):
def post(self):
entryHandler=ec.EntryHandler()
for arg in self.request.arguments():
file = self.request.get(arg)
response = entryHandler.saveImage(arg,file)
self.response.out.write(response)
and in "ec.py":
class ImageChunk(db.Model):
entryRef = db.ReferenceProperty(Entry)
chunkIndex = db.IntegerProperty()
chunk = db.BlobProperty()
class EntryHandler:
def saveImage(self, fileName, file):
results = Entry.all().filter("photoPath =", fileName).fetch(1)
if len(results)==0:
logging.warning("Error - could not find the entry associated with image name "+fileName)
return "Failed"
else:
MaxBTSize=1000000
entry = results[0]
marker=0
chunks=[]
while marker*MaxBTSize<len(file):
if MaxBTSize*(marker+1)>len(file):
chunk = ImageChunk(entryRef=entry, chunkIndex=marker, chunk=db.Blob(file[MaxBTSize*marker:]))
else:
chunk = ImageChunk(entryRef=entry, chunkIndex=marker, chunk=db.Blob(file[MaxBTSize*marker:MaxBTSize*(marker+1)]))
chunk.put()
marker+=1
logging.info("Successfully received image "+fileName)
return "Successfully received image "+fileName
Pretty basic stuff: we chop the image up at each 1,000,000-byte mark, and put each chunk into its own ImageChunk DB object.
Then, when we need to retrieve the image, in 'main.py':
class ShowImageWithKey(webapp.RequestHandler):
def get(self):
key = self.request.get('entryKey')
entryHandler = ec.EntryHandler()
image = entryHandler.getImageByEntryKey(key)
if image is not None:
self.response.headers['Content-Type'] = 'image/jpeg'
self.response.out.write(image)
and in 'ec.py':
def getImageByEntryKey(self, key):
chunks = db.GqlQuery("SELECT * FROM ImageChunk WHERE entryRef = :1 ORDER BY chunkIndex", key).fetch(100)
if len(chunks)==0:
return None
image=""
for chunkRow in chunks:
image+=chunkRow.chunk
return image
Since db.Blob is a subtype of str, that's all you have to do. I don't understand why some people are so upset about this: it's mildly annoying that I had to write the above, but hardly crippling. At least with JPEGs, which is what we use. (But I don't see why any other file type would be more difficult; they're ultimately all just a bunch of bytes). Could hardly be easier ... well, until App Engine rolls out their large file service.
(eta, Dec 14: which came out today! Meaning you can now disregard all the above and just use the new Blobstore instead.)
(eta, Dec 16: mmm, maybe not. Looked at the Blobstore in detail today, and it's really best suited for browser projects, not app or web-service stuff. The API for the blobs is very limited, and you can only access them via one-time-only URLs that App Engine puts in your HTML. You could scrape that, granted, but that's a pain in the ass, no less inelegant than the image-chunking solution above. It's experimental and subject to change, too. I think I'll hold out until its API improves.)
Labels: AppEngine, BigTable, chunking, chunks, Images, JPEG, JPG, limit, MemoryError, python, size
Wednesday, November 4, 2009
to infinity, and beyond!
I am pleased to report that my pet-project iPhone app, iTravelFree, has passed the stern inspection of Apple's App Store and is now available for download worldwide. For app links, a screenshot-laden tutorial, and help and FAQ files, see here: www.wetravelright.com.
(Yeah, crappy URL, I know, but all the good ones were taken.)
Since this is my tech blog let me wax about its architecture a bit. The iPhone app is pretty straightforward: basically, it's a bunch of TableViewControllers, many of which include WebViews, along with a MapViewController, all pointing to a bunch of CoreData records. Nothing extraordinarily fancy by any means.
The server side is more interesting: it's a Google App Engine service, written in Python, that fetches, caches, and parses Wikitravel pages for the app. This gives me a single point of access to the data flow, lets me do things like convert addresses to lat/long location, cuts down on bandwidth for both Wikitravel (thanks to the caching) and the phone app (thanks to the parsing and stripping out of extraneous info.)
The general architecture - phone app plus App Engine service - is actually really powerful and easy to work with. Basically, it's a distributed version of the classic Model-View-Controller architecture, where the phone is the view, the App Engine service is the controller, and whatever data you're accessing is the model. This lets you do all the heavy-lifting computation on the server side, which is where it belongs, and keep the phone (and its puny processor) focused almost purely on the UI.
I do have some reservations about the BigTable data store that App Engine uses, but they don't apply to projects like this, with relatively simple storage requirements and no data mining.
I wrote it in, hrmm, about six weeks all told, starting in July. (Obviously it's been much more than six weeks since then, but I had full-time work starting August so could only work on this in fits and spurts on the side.)
Anyway - the app is in pretty good shape, but there's more work to be done on the server side, so it's still basically in beta test. Take a look, download it, play around, and let me know what you think -
(Yeah, crappy URL, I know, but all the good ones were taken.)
Since this is my tech blog let me wax about its architecture a bit. The iPhone app is pretty straightforward: basically, it's a bunch of TableViewControllers, many of which include WebViews, along with a MapViewController, all pointing to a bunch of CoreData records. Nothing extraordinarily fancy by any means.
The server side is more interesting: it's a Google App Engine service, written in Python, that fetches, caches, and parses Wikitravel pages for the app. This gives me a single point of access to the data flow, lets me do things like convert addresses to lat/long location, cuts down on bandwidth for both Wikitravel (thanks to the caching) and the phone app (thanks to the parsing and stripping out of extraneous info.)
The general architecture - phone app plus App Engine service - is actually really powerful and easy to work with. Basically, it's a distributed version of the classic Model-View-Controller architecture, where the phone is the view, the App Engine service is the controller, and whatever data you're accessing is the model. This lets you do all the heavy-lifting computation on the server side, which is where it belongs, and keep the phone (and its puny processor) focused almost purely on the UI.
I do have some reservations about the BigTable data store that App Engine uses, but they don't apply to projects like this, with relatively simple storage requirements and no data mining.
I wrote it in, hrmm, about six weeks all told, starting in July. (Obviously it's been much more than six weeks since then, but I had full-time work starting August so could only work on this in fits and spurts on the side.)
Anyway - the app is in pretty good shape, but there's more work to be done on the server side, so it's still basically in beta test. Take a look, download it, play around, and let me know what you think -
Labels: AppEngine, Apple, AppStore, BigTable, iPhone, iTravel, iTravelFree, python, Wikitravel
Subscribe to Posts [Atom]