Monday, April 21, 2008

Eclipse Mylyn - Sync to PEAR

After figuring out how to tie Mylyn to Sourceforge's trackers, and learning via observation how its screen-scraping regexp worked, I moved on to configuring a Task Repository tied to the PEAR bug tracking system.

Rather than rehashing the explanations of what Mylyn elements I am configuring, this time I'm just going to provide the config settings that worked for me. Once again, my configuration points to PhpDocumentor items only, so you'd need to adjust accordingly to point to a different PEAR package.

Task Repository:
  • Server: (notice "http")
  • User ID (my own PEAR account)
  • Password
"Additional Settings":
  • serverUrl = (notice "https")
  • package_name = PhpDocumentor
  • limit (list the Parameter, but enter no Value)
  • id (list the Parameter, but enter no Value)

I have only one Task List repository query defined, which pulls all three item types in the PEAR bug tracker together (Bugs, Requests, Documentation items):
  • serverUrl =
  • package_name = PhpDocumentor
  • limit = 100 (I choose this number arbitrarily, but you want it as high as needed to ensure all bugs you want captured will appear in the one resulting webpage)
  • id (list the Parameter, but enter no Value)
"Advanced Configuration:"
The Query URL is


The Query Pattern is

<tr valign="top" class="...">[\s]*<td align="center"><a href="/bugs/({Id}[0-9]+)">[0-9]+</a><br /><a href="[^"]*">\(edit\)</a></td>[\s]*<td align="center">....-..-.. ..:.. ...</td>[\s]*<td>({Type}[a-zA-Z]+)</td>[\s]*<td>[^<]*</td>[\s]*<td>[^<]*</td>[\s]*<td>[^<]*</td>[\s]*<td>[^<]*</td>[\s]*<td>({Description}.[^<]+)</td>[\s]*<td>.[^<]*</td>[\s]*</tr>

Contrasting this pattern with my Sourceforge pattern, you might notice that I'm capturing Type in addition to Id and Description.

Ideally, I would want to also capture a Roadmap for each item, so that I could separate items into task groups (a PEAR Roadmap equates to a plan for a particular future release version). However, doing this would require a Task Query for each task group that I wanted to make... plus, Roadmap is not a column visible on the Bug table webpage that is scraped, so it is a moot point at the current time. So, for now, having all items listed together is enough for me.

Tuesday, April 15, 2008

Eclipse Mylyn - Sync to Sourceforge

The task tracking plugin Mylyn has proven to be a useful addition to my Eclipse layout (I'm using the Europa release). Luckily for me (and my PhpDocumentor work), I'd found a wiki post outlining how to create a "task repository" that ties in to, but I think the instructions were old enough that the Sourceforge HTML pages no longer matched. The key point to understand when making a Mylyn task repository using its "generic repository connector" option is that you are effectively screen-scraping the HTML. As such, any general changes in the layout of the target webpage can cause your carefully configured "repository" to no longer function.

Set up a Repository
I found that I only needed one "repository" for each project/application over on Sourceforge. So, for my PhpDocumentor tie-in, I used these settings:
  • Server: (notice "http")
  • User ID (my own SF account)
  • Password
Then, in the "Additional Settings", I added these Parameters/Values:
  • serverUrl = (notice https)
  • group_id = 11194 (this points at "PhpDocumentor")
  • atid = 111194 (this points at "Bugs")
For connecting to any other Sourceforge project, you need to know the group_id that points to the project, as well as the atid values that point to the project's bugs and features listings.

Setting up the Task Queries
Once the repository is configured, you must build two separate Query objects, one for Bugs and one for Features. The Parameters/Values will be the same as on the Repository object, except for the "atid", which for PhpDocumentor should be set to "111194" for Bugs and "361194" for Features.

Under "Advanced Configuration" is where the complexity/magic comes in, and both "Query URL" and "Query Pattern" are the same for Bugs and Features.

The Query URL is


while the Query Pattern is

<a href="/tracker/index.php\?func=detail&amp;aid=({Id}[0-9]+)&amp;group_id=${group_id}&amp;atid=${atid}">[\s]*<!-- google_ad_section_start -->({Description}[^<]+)<!-- google_ad_section_end -->[\s]*</a>[\s]*</td>

These should be copied as one line of text each (no spaces), as my blog page insists on line-wrapping them. And yes, those "&amp;" pieces are supposed to be in the Query Pattern... they are not mistakes in the HTML rendering of this blog post. I assume they are necessary in the Query Pattern but not in the Query URL because the Pattern is a regular expression, and the SF HTML that is screen-scraped in will have its entities replaced before the text gets compared to the RegExp pattern.

Unfortunately for us Pragmatic minded hackers, I could find no way to float this duplicated Query config info up to the level of the "repository" config, to avoid "repeating myself" across the two Task Queries ;)

These Task Queries pull into Mylyn all the bugs/features found at Sourceforge for the project you chose via the "group_id", and you can view the actual SF webpage for a given bug/feature by opening the individual Mylyn task and choosing the "Browser" folder tab. Also, the "New Task" option in Mylyn will send you to the "Submit New" webpage at SF. Unfortunately, I have found that I typically have to keep logging myself in on the SF pages each time Mylyn opens one, as it seems like having your username/password configured on the "repository" is not enough.

As a final point, I only developed the Query Pattern's regexp pattern enough to capture the ID number and Description values from the Sourceforge items. There are additional "fields" that the Mylyn connector has available to "set", though this requires more hacking on the Query Pattern.