A new RSS crawler and parser for FeedShow

During the last few weeks a new crawler and parser were developed for FeedShow.

The crawler is now running and you will see it in your logs. Its name is “Feedshow/0.3″.
This crawler supports ‘conditional GET’ requests and handles file compression (gzip, deflate). It also adapts the crawl period automatically. A smooth algorithm adapts the period from 2 hours to 1 day depending on the update period of the RSS files (the final target is 1h to 1day).

The new parser is based on a modified version of the feedparser by Mark Pilgrim.

It can handle RSS 0.90, Netscape RSS 0.91, Userland RSS 0.91, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, Atom 1.0, and CDF feeds. It also parses several popular extension modules, including Dublin Core and Apple’s iTunes extensions.

Not all of the features (data model) are currently used in the FeedShow user interface, but it would not be a problem to add some later.

I will now concentrate on fixing details in the user interface. No new features (for now), just concentrate on making FeedShow an easy, fast, robust, … RSS feed reader (agregator).

Print

3 Comments »

  1. FeedShow » Nouveau Crawler et parser RSS en place. said,

    January 18, 2006 @ 1:47 pm

    […] Les prochaines améliorations concerneront l’interface et consisteront à rendre le lecteur plus facile à utiliser, rapide, robuste, … (version Anglaise) […]

  2. christopher baus said,

    March 7, 2006 @ 8:23 pm

    Would you consider using a third party to distribute updates to feedshow? I’ve been working on Feedflow.com which allows users and developers to subscribe to an update feed. Instead of crawling all your feeds periodically you just ping our consolidated feed, and we tell you everything that has been updated that you care about.

    It might give you more opportunity to focus on the higher level issues with your app.

    Cheers,

    Christopher

  3. Thierry said,

    March 7, 2006 @ 10:34 pm

    Christopher,
    I am ready to consider any service that could improve Feedshow performance. It seems that the “process” you describe can be implemented beside the “basic” crawling I am currently using.
    I could not find more infos on feedflow.com…do you have details ?

RSS feed for comments on this post · TrackBack URI

Leave a Comment

You must be logged in to post a comment.