get all the urls in html file (local or on server).

To use this, you will need the lynx tool, so install that first.

sudo yum install lynx

Now, to get list of all the URLs in local html file or some URL, just execute this:

lynx -dump -listonly

 

Enhanced by Zemanta

Fix typescript files generated with script command

Generally quite a lot of us would have used the script command. This generates the logs for the session. But the problem with the logs is that it contains a lot of un-readable characters. These characters are mostly from the color codes, and as such can be removed very easily with a single command:

cat typescript | 
 perl -pe 's/e([^[]]|[.*.*?[a-zA-Z]|].*?a)//g' | col -b > typescript-processed

This assumes the input log file is named as typescript and the output is kept as typescript-processed. You can change the names as required.

 

 

Enhanced by Zemanta

pdfjs – html5+javascript based pdf viewer for Firefox.

Firefox 15 has arrived and there is lot to brag about. Since there are already many blogs on this, so justĀ  giving you a link to one of those. But what is most interesting is the integration of PDF.js. It is a new html5 + javascript based pdf viewer.

GHacks Link

And to enable this viewer all you need to do is go to “about:config“. Search for pdfjs and set pdfjs.disabled to “FALSE” and your browser is all set to use the inline pdf viewer.

Enhanced by Zemanta