Archive for the 'Python' Category

GigBlastr - an experimental Pylons app on Google App Engine

On wednesday we had a big turnout at the monthly Boston Python Users Group meeting which was hosted for the first time at Betahouse, the co-working loft where Jazkarta has our office space. The topic for the evening was developing Python web apps for deploying to Google App Engine.Picture 2.png

The agenda for the evening…

1. Brian DeLacey gave a short intro to GAE supported by showing the video screencast of a Google engineer creating a simple guestbook application.

2. PK Shiu gave an intro to developing web applications with Django.

3. And then I gave a talk about building a simple Pylons app for Google App Engine. I had been working feverishly in the days preceding the meeting trying to get the app working, battling the unfamiliar syntax of a new web framework and bumping up against the limitations of GAE.

But before I dive into the technical details, a brief background on my motivation for building this app. At the last DevHouseBoston3 (also hosted at Betahouse), I paired up with Malthe Borch, (who was in town visiting from Copenhagen), to work on an idea called GigBlastr. The basic idea is to provide a tool for musicians who want to promote their gig on the plethora of event listing services (eventful, upcoming, facebook, twitter, etc) but don’t want to visit all of those sites individually.

So the GigBlastr service would provide a one-stop-shop service to “blast” the gig info to all those services just by filling out one form. This is similar to the service provided by Tubemogul for videos, and Ping.fm for status updates.

Malthe and I implemented some very basic functionality leveraging the content rules engine in Plone to take an event you post to your Plone site, and send that event information to Twitter and Eventful. Later, Lennart Regebro created a p4a.upcoming package to add support for posting the event info to Upcoming.org as well.

Well, this was all fine and dandy but who really wants to load up a big ol’ CMS just to post some events? So this idea simmered for about a year, and in the meantime Google App Engine was announced, and I began to think how cool it would be to deploy this app on GAE.

And thus began my hunt for an appropriate framework with which to at least build a proof-of-concept prototype. I discovered that one of GAE’s limitations is that they disallow the use of Python’s standard httplib and force you to use their own urlfetch. Well, this would mean re-writing all of the Python libraries for talking to the various APIs, since they all depend on httplib/urllib.

In response to the 372 people (at last count) who starred this issue as being really important to fix, Guido van Rossum (the inventor of Python and employee of Google) responded by saying that if someone wanted to write a urllib replacement on top of urlfetch, he would be happy to review it and try to get it added.

Well, not more than a week later, Ian Bicking (creator of Paste and SQLObject and developer at The Open Planning Project) came up with just that, an implementation of httplib on top of urlfetch.

It seemed as though the quickest way to try out the posting of event data using Ian’s patched httplib, was to follow his instructions for getting Pylons working on Google App Engine, since he had already got Pylons to work with his patched httplib.

Well, this proved to be more tricky than I had imagined, and I ran into one problem after another. At times I felt like a man walking in the desert and whenever I would see an oasis of water, it would be just a mirage. But thankfully, I was able to tap into Ian’s brain on the #pylons IRC channel and also talk to other Pylons folks, and eventually (just an hour before my presentation!), I was able to get it to work.

I can say that right now, this is bleeding edge stuff, and I wouldn’t recommend it unless you have a lot of patience, or unless Google decides to add Ian’s httplib and lift the 1000 file limit. This arbitrary 1000 file limit is one of the major obstacles in trying to get other Python-based web frameworks such as Zope 3 working on GAE, and has required many silly workarounds. Ian also comments on some of these limitations imposed by Google App Engine on his blog.

If you do want to check it out, I’ve posted the slides from my presentation (with cmd line examples) on slideshare.net.

Dave Fisher also recorded some video at the event but this clip is only about 5 min long.

For those of you who missed the Google I/O conference, you can watch the videos and view the slides from the presentations here.


Intersecting journey of Free Culture, Creative Commons and Plone

Several years ago I discovered the book Free Culture by Lawrence Lessig. It was actually one of the first eBooks that I put on my handheld PDA at the time, a Handspring Visor, a device that now seems quaint compared to my iPhone. I remember watching the Flash presentation of Lessig’s talk at OSCON in 2002, and being motivated to learn more about copyright law. Lessig made the issues tangible, and of incredible importance to anyone who considers themselves a creator.

More importantly, he demonstrated that the copyright laws of yesterday were no longer suitable for the creators of today, and what was needed was a new way to license your creative work. And so he founded the Creative Commons, a non-profit organization that provides free tools for creators to easily mark their creative work with the freedoms they want it to carry.

During one of my trips out to San Francisco, I met with Mike Linksvayer at the Creative Commons headquarters. At the time I was very interested in adding Creative Commons licensing support to Plone, the open source content management system, so I wanted to talk to him about the best ways to accomplish this.

Well, we didn’t talk much about Plone, but I did get to have lunch with the other CC folks, and afterwards Mike suggested that I talk to Nathan Yergler, the Python programmer who was making so many cool CC tools, that they had to hire him.

Upon closer inspection, I discovered that Nathan was doing some pretty cool stuff with Zope 3, including building desktop applications such as ccPublisher. While I still haven’t met Nathan (now CTO of Creative Commons), I can see from his blog that he recently moved from Indiana to San Francisco, so there’s a much greater chance that our paths will cross now.

Meanwhile Jonah Bossewitch had written up PLIP #136 (Plone improvement proposal) to get content licensing support native in Plone. There was a product called PloneCreativeCommons that was a good start, but Brent Lambert and David Ray from Utah State University, took it a step further during the Big Apple Sprint (also organized by Jonah) and created ContentLicensing a really great add-on product for Plone that we’re now bundling with Plone4Artists.

After moving to Boston, I got to know some of the folks involved with the Harvard Free Culture group, one of many college-based Free Culture groups that promote the public interest in intellectual property and information & communications technology policy.

Last week I was hanging out with the Free Culture kids at a dinner at the Cambridge Brewing Company hosted by Dean Jansen and Will Guaraldi, both of the Participatory Culture Foundation, best known for the Miro video player. For the Plone4ArtistsVideo add-on product for Plone, we’re exploring using some of the Python code in Miro for scraping popular video sharing sites such as Youtube.

Recently I stumbled across this TEDTalk video presentation by Lawrence Lessig, and invite you to watch as he takes you through a fascinating journey about culture and gets a standing ovation at the end.


Hivurt - a new Zope 3 based CMS?

Stephan Richter just alerted me to a new CMS that has been developed in Zope 3. Mikhail Kashkin from Key Solutions (a Russian company) reported on the Zope developers list that they are developing a Zope3-based CMS called Hivurt and some of the components are already available on their SVN repository.

All of the documentation is only in Russian, but Stephan said that they are busy translating it to English. A Google search brought up this code repository on code.google.com so that may be the future home of the English translated code and documentation.

Looking at the screenshots, I can’t but help observe that the UI looks surprisingly similar to Plone/CMF with a “contents” tab, and “add” and “actions” dropdown menus. Although given that it’s built on top of pure Zope 3, I’m guessing that it’s much faster than Plone. It will also be interesting to see what functionality is missing from Hivurt that comes out-of-the-box with Plone.

According to this newsitem, Sportbox.ru is the first major site built using Hivurt CMS. Sportbox.ru is the official website of TC “Sport” – Russia’s “most reliable sport news supplier.” Its potential audience in Russia is 62 million people living in 72 regions of Russian Federation.

TC “Sport” plans publishing the latest information in the sports world, video collections (Video-On-Demand), streaming video on various sports commented by the leading journalists and sports professionals. The portal offers many interactive features for its visitors, blogs of famous sportsmen, coaches and commentators.

Hivurt sounds like a very promising CMS and a welcome addition to the Zope 3 community. We’re looking forward to seeing the English documentation and trying it out!

Blogged with Flock

Tags: , , , , , ,


Kirbi, a peer-to-peer library built with Grok

All of us have books that sit on shelves, doing nothing for anyone but collecting dust. I have some books that I have read several times, some others that I have read once, and still others that I have never read. I buy books them because I intend on reading them… some day. But no matter how voracious a reader you are, you can’t possible read all of your books simultaneously which means that there are a great many books that are simply being unused. All that knowledge trapped inside a bound volume just waiting to reveal itself to an open mind.

The problem
There are some books that you’d like to share with others, but unlike music or videos which can be digitized and easily replicated, a book is a physical object and only one person can have that copy. It’s quite time consuming to scan in every page, although that didn’t stop someone from scanning in the latest Harry Potter book to make it more widely available. So why are we reluctant sometimes to lend a book? Well, it’s a finite resource so if you give it to someone, you can’t use it until they give it back. Even worse, what if you forget who you gave it to, or you remember but they never give it back.

This was the problem that faced Luciano Ramalho, a Brazilian Zope developer and trainer and student of library science. I stayed with him in his modest sized apartment in São Paulo last week, and observed that his study was lined floor-to-ceiling with books. Luciano is truly a lover of books. And he wanted to be able to share them with his friends and colleagues but needed a way to keep track of who had which book and how long they had it.

So he scratched this itch and developed Kirbi, as part of a Google Summer of Code project. Kirbi is a web application to allow anyone to turn their personal book collection into a lending library, making book sharing among friends and colleagues easier and safer. Kirbi was created using Grok, a web application framework based on Zope 3.

“As a library sciences student, I designed [it] to increase the reuse of books, foster the exchange of reading experiences, and make books more accessible to all, particularly in developing countries.” -Luciano Ramalho

What makes it different?
There are similar initiatives such as Bookcrossing, but Luciano explains that with Bookcrossing there is very little incentive to keep these books in the system. They usually get left somewhere and forgotten. With this peer-to-peer library system that he has envisioned, you are borrowing books from your friends, so there is a personal connection and accountability to return the book or recirculate it.

Another similar project is Lovely Books, developed by Lovely Systems another Zope 3 development company in Austria. Luciano explained that there are many of these such sites, but they are mostly focused on letting the users show off their book collection, but not actively share these books physically with each other.

Kirbi works more like your public library in which you can request a book from someone in the network, and then set a time and a place to do the handoff. But in this case, the library is peer-to-peer and not centralized. You are more likely to have books that your friends want, and vice versa because you have similar interests. Luciano recommended a book by Yochai Benkler called, “The Wealth of Networks: How Social Production Transforms Markets and Freedom” (Yochai Benkler) which talks more about this idea. This book was already in my Amazon.com wishlist, but like so many other books, I have yet to read it.

Try it out!
Luciano has already developed a prototype demo site at Circulante.org where you can create a free account and try out the system. He’s made it very easy to add new books just by typing in the ISBN number, but has plans in the future to make it even easier by scanning the barcode on the back of the book using your computer’s webcam, similar to how the desktop software Delicious Library does it.
I’ve been thinking that my company, Jazkarta, could use this peer-to-peer library software. We all have books that each other would probably like to read, but we often don’t know that someone we know has the book that we want. The only problem in our case is that since we are spread out, the opportunities to physically hand-off the book are very seldom. We see each other at sprints and conferences and then we might not see each other again for months. But this may be okay for some books to loan them for this long.

If you start thinking about all the communities you are in and the people with whom you may want to share books: your apartment building, your church, your workplace, a neighborhood association, etc. There are ample opportunities to expand the number of people who could participate and share their books.

Developers get involved!

I think this is a really great idea and is the first real public example of a feature-rich web application built with Grok. Luciano is seeking others who would like to contribute to the project. It’s a great way to learn more about Grok / Zope 3. You can find the code in the Launchpad Bazaar repository or in the svn repo on Zope.org. I recorded a video of Luciano talking more about Kirbi and Grok and will post it as soon as I get caught up on things.

Technorati Tags: , , , ,


Scraping a jazz events calendar

As mentioned in my last post building a live music calendar, I’m disappointed that the websites that list jazz events in Boston don’t offer the data as an RSS or iCal feed. One example of this is the WGBH Jazz Calendar, which has probably the most comprehensive listing of jazz events in the Boston area.

In my talk about Plone4Artists at EuroPython 2005, I mentioned a tool called Scrape ‘n’ Feed, which will scrape a website and generate an RSS feed. Well, it’s been a year since I first discovered this tool, and now I’m revisiting it to see if I can make it work. Here is my first foray into this scraping business.

ScrapeNFeed depends on Beautiful Soup and PyRSS2Gen which are easily installable on Ubuntu Linux with:

apt-get install python-pyrss2gen
apt-get install python-beautifulsoup

Once I installed these two packages, I downloaded the ScrapeNFeed.py script and created the following file ‘getwgbhfeeds.py’:

#!/usr/bin/env python
import BeautifulSoup
from PyRSS2Gen import RSSItem, Guid
import ScrapeNFeed

class WGBHFeed(ScrapeNFeed.ScrapedFeed):    

    def HTML2RSS(self, headers, body):
        soup = BeautifulSoup.BeautifulSoup(body)
        eventTable = soup.firstText('Sort By:').findParent('table')
        tds = eventTable.fetch('td',{'class':['searchres', 'searchres_off']})
        items = []
        for item in tds:
            link = item.findNext(’b')
            eventLink = self.baseURL + link.a['href']
            if not self.hasSeen(eventLink):
                eventTitle = item.a.string
                eventDate = item.contents[0].strip()
                eventLocation = item.contents[5].strip()
                items.append(RSSItem(title=eventTitle + ‘(’ + eventDate + ‘)’,
                                     description=eventLocation,
                                     link=eventLink))
        self.addRSSItems(items)

WGBHFeed.load(”WGBH Concerts”,
                 ‘http://www.publicbroadcasting.net/wgbh/events.eventsmain?
action=showCategoryListing&newSearch=true&categorySearch=5596′,
                 “See all the jazz concerts posted to the WGBH calendar”,
                 ‘wgbh.xml’,
                 ‘wgbh.pickle’,
                 managingEditor=’name@domain.com (First Last)’)

Run the script with ./getwgbhfeeds.py and it will output a file wgbh.xml , which is in the RSS 2.0 format. You can then open this file using your RSS reader of choice, and view all the Boston jazz events.

Once thing that I noticed is that some of the items in the list have an extra <br /> which means the title doesn’t get read in correctly. I’ll have to find a way to ignore the <br /> which I sure will be fairly simple with BeautifulSoup.

What’s next

At the OPMLCamp a few weeks ago, I met Mike Kowalchik, the creator of grazr. After seeing this tool, I immediately thought about how useful it would be for generating a browseable directory of event listings. You simply supply grazr with an OPML file, and it will then display all the RSS feeds and their entries. After I get a couple more event listing sites scraped, I’ll generate the OPML file and try them out with grazr.

Mike also mentions on his blog about Tom Morris’ idea about using grazr to ‘kill myspace’ by creating a better way for independent bands and artists to self promote using OPML. Note to self: follow up with Tom to discuss this idea further. I love the integrated MP3 player in his grazr box. Update: left him an Odeo message.

Technorati Tags: , , , , , ,


Building a live music calendar

While reading from Derek Siver’s O’Reilly blog, I came across Mark Hedlund’s talk Entrepreneuring for Geeks which described how the more technically minded can move into making companies of our own. He started out the talk with a set of proverbs.

The three proverbs that struck close to home for me were:

  • pay attention to the idea that won’t leave you alone.
  • build what you know
  • momentum builds on itself

Pay attention to the idea that won’t leave you alone

Several events have occurred in the past two weeks which have echoed these words in my mind.

During the BarCampBoston I spoke with other geek entrepreneurs about the problem of finding live music, and the guys from tourb.us told me about how they are scraping venue’s sites to get concert listings. They are providing a service that answers a particular need - when is my favorite band coming to town?

This triggered a memory of an exchange I had more than a year ago with trombonist Phil Wilson at the Jazz Journalists Association panel at Schullers Jazz Club. Jon Hammond organized a panel discussion on the topic of Boston as a Launching Pad for a Jazz Career. I asked the panel what kind of online tools or services could be provided to re-ignite the jazz scene in Boston. And Phil said that he would like to see a service that would notify him when a musician was going to be performing.

Then at the last Python meetup, Dan Milstein raved about the python scraping library BeautifulSoup and described how capable it was at scraping baseball scores off a website. I played around with BeautifulSoup awhile ago, but never actually built anything using it.

Scratch an itch

“Build what you know” affirms that the most basic advice of idea generation is to scratch an itch you have yourself. Now I have an itch to scratch. I love going out to hear live music, especially jazz - but there is no single site that aggregates the concert listings. There are several sites I must visit:

  • MyRootdown Improv Music Calendar is a great site built by graphic designer and improv enthusiast Shawn dos Santo. Shawn is doing a great job of posting events he hears about, but there’s no way for people to post their own gigs
  • The WGBH Jazz Calendar is good but again, it doesn’t have an RSS/iCal feed so I have to manually visit the site everytime I want to see who’s playing.
  • Each and every venue has their own concert listing page (Scullers, Regattabar, Wallys, Berklee, Reel Bar, etc.) and of course, none of them have RSS or iCal feeds.
  • I’m sure there are others that I don’t know about

The basic problem here is that there is a fragmentation of information. Since none of the sites publish their event listings in any sort of structured way (RSS, iCal, hCalendar), it’s tedious to monitor these listings and thus hard to stay on top of what’s going on in the Boston jazz scene.

The “Pull” method

Immediately after hearing Phil’s suggestion, my technical mind started churning as I thought about generating dynamic RSS feeds based on artist or band name, and then using something like Feedblitz to turn those RSS feeds into email notifcations. As much as us geeks would like to think it’s true, the average person still has no idea what an RSS feed is or how to use it. Email is still the lowest common denominator.

But the question still remains how to get the data into a system in the first place. It is not likely one can expect musicians to enter their gig listings themselves. And here is where Beautiful Soup comes in - if I scrape the event listing sites, I can put the data into a system, extract the metadata (band, location, date/time, cost, etc.) and syndicate these concert listings as RSS feeds and subsequently email notifications.

There is even a python script called Scrape ‘n’ Feed which will automatically turn a page scraped with BeautifulSoup into an RSS feed. This is why I love python - there is almost always a library that does exactly what you want. And there is also a python script to convert iCal into RSS.

The “Push” method

Now suppose for a moment that one could get musicians to enter their gigs into some sort of system, and what if you could offer a service, let’s call it GigBlast, which would push their gig information out to a bunch of event listing services: WGBH, eventful.com, upcoming.org, boston.craigslist.org, meetup.com, etc. using the API provided by those services or in the case of WGBH which has no API, use python libraries such as clientform to submit the form.

This would make it easier for musicians to get the word out about their gigs, give fans a tool to be informed when these musicians are performing, and ultimately get more people to go out to hear music which would create more demand for live music. Maybe I’m an idealist to think that it will have such far reaching effects, but even if no one else uses this service, at least I’ll be scratching my itch!

Momentum builds

Stay tuned for more thoughts on publishing events to the web using Apple’s iCal. This will simplify the data entry process even more as musicians can simply add their event info to iCal, and in the background it’s it’s transparently uploaded to their website and automatically pushed out to the event listing services.

I also want to explore the use of microformats, such as hCalendar, which I think have a better chance of being adopted among musicians, venues and bloggers since it is fairly easy to implement - just a few changes to the HTML template. Pages formatted with hCalendar are a breeze to scrape using Technorati’s events feed service and can be searched using Technorati’s experimental Event Search tool.

Well, after many days of sideways rain, the sun has finally come out in Boston, so I’m going for a jog in the Fens.

Technorati Tags: , , , , , ,