logo of blechtrottel.net blechtrottel.net
deutsch

Data from Web Tables

How to get data out of a table in the WWW in order to work with it in table calculations locally.

Intro

While tables on websites for many years have been providing lots of information, copying numbers for further work on your local pc is still cumbersome.

Here at blechtrottel.net we had to face this problem whenever we needed to get relevant numbers out of the monthly Webalyzer overview and into our own statistics.

Webalyzer monthly stats

Looking at the source, we can see at that the tabe has a simple and regular structure. It does therefore lend itself to being handled automatically by some script.

Webalyzer monthly stats (source)

Prep Work

For our work we need a basic understanding of both HTML and Javascript. Depending on taste, operating system and browser we also have to install either the plugin Greasemonkey or Tampermonkey. Similar other plugins should work as well, but will most probably require some slight modifications.

For our little exercise we want to get the first five values as displayed in our first screenshot and add the info on search strings as can be seen here:

Webalyzer search strings

The Script

Start

var tables = document.getElementsByTagName("table"); var monthlydata = tables[0].getElementsByTagName("b"); Here we fetch the tables into an array. From the first table we filter the monthly data, which is presented in <b>-tags in bold print.

At the very end of the code we call our function Clipboard(). This in turn first calls our function Statdata(), so we will have a look at it first.

Statdata()

function Statdata(start, end, plus) { var statdata = ""; var i = start; while (i<end) { statdata += monthlydata[i].firstChild.nodeValue; if (i<end-1) { statdata += "\n"; } i += plus; } return statdata; } We create a variable named statdata and fill it with the text interesting to us. Since these values appear at a regular distance from each other, we can use start, end and plus to navigate to them.

In between these text strings we put a newline, so we get one big textstring with values in a column.

Clipboard()

function Clipboard() { var statdata = Statdata(0, 5, 1); statdata += "\n"; var searchdata = tables[10].getElementsByTagName("th"); var searchvalue = searchdata[1].firstChild.nodeValue; searchvalue = searchvalue.replace(" Total Search Strings", ""); searchvalue = searchvalue.replace(/Top.*of /, ""); GM.setClipboard(statdata, "text"); } After we have collected our first numbers from the table into the statdata variable, we add another newline and the number of search strings. This in Webalyzer is hidden in a header (<th>). Therefore we have to use regular expressions to get to the number inside the text.

Finally, we copy everything into the clipboard. In Greasemonkey the command is GM.setClipboard();, in Tampermonkey it is GM_setClipboard();. In both cases you will have to activate this in the header lines of the script: // @grant GM.setClipboard and // @grant GM_setClipboard, respectively.

The numbers in the clipboard can now easily be pasted into your document, for example a table calculation in Calc or Excel.

download

If you like our example, you can download it in our version für Greasemonkey and tinker with it for your own needs.