Author Topic: "Screen Scraper" question  (Read 2276 times)

0 Members and 1 Guest are viewing this topic.

Tuoni

  • Gator
  • Posts: 3032
  • I do stuff, and things!
"Screen Scraper" question
« on: April 22, 2008, 07:26:19 AM »
I need to retrieve all the TÜV certificate numbers, and the models for which those certificates apply for a certain company.  the website I've found is http://tuvamerica.com/tools/clientlists/certs.cfm and I was wondering if the following thought process is a logical possible one:

My software connects to the website
It enters the company name in the search field (e.g. "ABC") and clicks search
For each search result it gives back, it spawns a thread which follows the link to details and enters each model number into a hashmap which it then returns and terminates the thread...
Parent thread amalgamates the hashmaps (no problems here, all model numbers are unique)

I know I can grab the page render programmatically, but is what I'm thinking of possible?

I could build the hashmap by hand, I know (even as an XML/excel file which can be edited later), but I would like a way to do it programmatically as once I release this software I want it, basically, to be, as much as is possible, a black box (though I have taken into account URL changes, website layout changes etc)

Am I barking up the wrong tree here?  Attached are a few screen shots I've taken from the website (note this was just the first company which came up on the list, not the one I'm working for...)

Tuoni

  • Gator
  • Posts: 3032
  • I do stuff, and things!
Re: "Screen Scraper" question
« Reply #1 on: April 23, 2008, 02:12:24 AM »
I've come up with a solution which I think should work - I have managed to work out the query that the server sends to render the page, so I will go about it by chucking it a list of certificate numbers and grab the server's response.