Requirements annotated - Simplefeed?

Here I tried to collect what Simplefeed offers against our requirements and what should be transformed.
This is one of the possible ways before the project.

Downloading the feed

The current modules use two basic functions:

The aims of this layer: download the data, follow the redirects and handle URLs that require authorization. cURL solves all aims, drupal_http_request needs some external logic (additional header item for HTTP request) for authorization.
For the API I prefer to avoid dependency on cURL, so I can imagine a function that adds some more functionality to drupal_http_request.
From the discussion at the mailing lists I heard that the cURL beats the performance of drupal_http_request. Maybe the API should offer cURL as a second option.
I assume this task should be implemented only in the API. So the modules don't have to take care of it.
alex: It could be of interest to overwrite this stage: think of a mailing list parser. you would subscribe at this stage not to a URL out there but to a mail box.
Aron: No problem. If we have pluggable downloader codes, one of them can be able to handle a mailbox.

pro: Simplefeed does not define any download method.
con: the downloading and the parsing is covered by the same hook - hook_feed_parse (because at simplepie there is no separation too)

Parsing the feed

The current situation: almost all module use their own-style parser and data structures. The API should provide the following:

  • Registering the parsers, collect informations like: supported input types, PHP-compatibility level (?)
  • Define an output format, the parser have to provide such a data structure that the API requires
  • Let the modules choose between the compatible parsers (with the given system configuration), otherwise fall back to the default compatible parser.
  • Possibility of "one feed - multiple parser" assign (Morbus 's feature request)
  • Handle non-XML "feeds" (for eg. "latest comic" HTML) and be able to write a processor/feed parser to it (Morbus 's feature request)

API's part: provide a default parser, define a data structure, provide a pluggable parser system,

pro: easy to add another parser. It seems that this is not XML specific.
con: SimplePie is the default, maybe it should be replaced w/ Aggregation's parser
A common data structure lacks at the moment. At http://groups.drupal.org/node/4519 you can see the several parser's output.

Storing the feed and the news items

Instead of describing the ideas, here is an example database scheme:
The database scheme
Ideally an external module with added functionality shouldn't want to add another table, just use the existing ones. With the additional values, the modules can extend the functionality. For specialized things maybe the modules want to create new tables.
The most of this task ideally is behind the API. No. It's almost impossible. The external modules have to know how to store the things.

pro: basic data is stored via the core, additional things is stored via the extensions

Fetch/store/delete/update of news items regularly

This task is usually done on cron-time. The API should implement the hook_cron() and let the modules to control the flow of cron strictly. For example a frequent question is: when a feed should be updated? As I think the API should provide a general way, then let the modules to overwrite the rule of the refreshing. Then the actual feed-refresh code can be overwritten too. In this way, the module developers can decide what they want to do in cron-time (see http://groups.drupal.org/node/4309)
Only the skeleton is in the API.

pro: there is a feed_expire hook where external things can say what to process in cron-time.
con: currently the whole download/store/parse/etc happens, not trivial to modify this behaviour

Transforming the news items into some type of content

There are two totally different concept, turning a news item into a node or into sg. else. This part should not be in the API, because the modules has its favour in this question. For example some modules use fixed node-type, some of them use variable node-type, others uses blocks/whatever to show the feed and its content.
Nothing or only some helper functionality in the API.

con: the simplefeed.module has node-dependent parts - it represents feeds always as nodes. It should be removed and put external. Plus write an external module that mimics the core aggregator.

Simplefeed module has only a node based implementation (simplefeed_item.module) - write an external module that mimics the core aggregator.

Feed management

  • Add feeds (or other sources)
    • validate them
    • a huge feature whould be a functionality that detects feeds in a page, like blogline's UI
  • List feeds
  • edit and delete them - offer good functionality for handlng tons of feeds
  • feed statistics - last updated, scheduled time, incoming items (later stage: graphs)
  • Process queue management: if things are pluggable and pluggable on a per feed basis, we ll have to offer configuration pages for doing the plumbing. this can be a tricky one.

UI


The default module on the top of Aggregation API will have to provide some basic UI for configuring the module, configuring feeds, listing feeds and feed items. Strictly speaking the API is only the interface what the other modules will reuse.

pro: There is a simple UI already.