Pages

Monday, July 22, 2013

Screen scraping using YQL

I have been using Yahoo pipes for a long time now. I have done some screen scraping mash up using them. While yahoo pipe provides a component to fetch the HTML content from a URL, it is bit difficult to cut a specific part because it totally relies on a string to match. I came across YQL console where I could write SQL like queries and fetch the HTML content of any URL. The best part was that it supports XPath expressions for selecting the exact node of the HTML to extract data. For example I write the following query to get the stock price from the web page

[code language="sql"]
select *
from html
where url ="http://getquote.icicidirect.com/NewSiteTrading/trading/equity/includes/trading_stock_quote.asp?Symbol=BSES"
and xpath='//td[p/text()="LAST TRADE PRICE"]/following-sibling::td[2]/p'
[/code]

See above code running in YQL Console.

Similarly this query can be made little bit complex and parametrized for the stock symbol to form the appropriate url

[code language="sql"]
select *
from html
where url in (
select url
from uritemplate
where template="http://getquote.icicidirect.com/NewSiteTrading/trading/equity/includes
/trading_stock_quote.asp?Symbol={item}" and item=@item)
and xpath='//td[p/text()="LAST TRADE PRICE"]/following-sibling::td[2]/p | //td[p/text()="LAST TRADED TIME"]/following-sibling::td[1]/p'
[/code]

See the above code in YQL console. However this will not directly run from the console. One could just create a query alias and pass the required query string like the following.

http://query.yahooapis.com/v1/public/yql/neilghosh/liveQuote?item=INFTEC

This could have been done using the built in YQL component in the Yahoo Pipes itself but it would be an extra layer if you just need to get the required content from the HTML instead of having to play around any feed (for which Pipes is still the best choice). Of course there some limits/quota while using such YQL queries, which I need to explore in coming days.

For screen scraping I could directly use Google App Engine's URLFetch or curl in PHP servers but this would unnecessarily transfer the whole content consuming quota and leading to time lag.

Monday, July 1, 2013

Sharing Photosphere

I have been always a fan of panorama images. There are a lot of photo stitching software which can join many overlapping images to create a single one. I have used Photosynth earlier with a lot of satisfaction. There a lot of phones and digital cameras which can do it right out of the camera in panoramic mode, in which you have to slowly move the capturing device and it will continuously take and stitch photos to create larger panorama.  While there ares some websites like CleVR and GigaPan can help sharing the horizontal panoramas and let embed in various sites, spherical panoramas like the ones taken from photosphere app in Android 4.2 cameras could not be embedded in a straight forward way without doing some HTML coding. Finally I found a site called SphereShare.net for this purpose. Following is the embedded photo I took uploaded in this site.

(Click this link for a wider view. Because of low width of the blog, embedded one may not look good)



This time the players were moving, I will try to get a more stable image when I go outdoors next time :)
Here is a horizontal panorama embedded in photosynth. (Needs Microsoft Silverlight plugin )

Play music remotely from phone using Windows 7

I was searching for a Bluetooth transmitter which can transmit the music played in my cellphone to the speaker which mounted on my wall, so that I could control what is being played from my phone without using a very long aux cable. I had earlier posted about this but today I found it really useful to play and control music while lying down on bed. After pairing the phone you just have to click on the following setting in window to transmit music from phone to the speaker via the laptop. I could tune to any internet radio station , play any music on my phone and also take calls in the phone and everybody in the house could hear it.

BT1