Term Extraction: How to extract Keywords from Text

A common issue I have come across in the past is that I need to extract a set of keywords from a web page or article. After having checked on OpenCalais and Alchemy I stumbled upon YQL (Yahoo Query Language) from the Yahoo Developer Network.
This is such an awesome service, it’s a shame it hasn’t gained more popularity!

Generate keywords from a webpage or RSS feed.

To analyze a text, webpage or RSS feed and extract the keywords all you have to enter in the YQL Console is:

# extract keywords from text
select * from search.termextract where context="Italian sculptors and painters of the renaissance favored the Virgin Mary for inspiration"

# generate keywords from a web page
select * from search.termextract where context in (select content from html where url="http://en.wikipedia.org/wiki/Black_Friday_(1945)")

# extract keywords from a RSS feed
select * from search.termextract where context in (select title from rss where url="http://rss.cnn.com/rss/edition.rss")

# generate keywords from a RSS feed and sort the results
select * from search.termextract where context in (select title from rss where url="http://rss.cnn.com/rss/edition.rss") | sort(field="Result")

#  remove duplicates and sort the results
select * from search.termextract where context in (select title from rss where url="http://rss.cnn.com/rss/edition.rss") | unique(field="Result") | sort(field="Result")

It’s pretty easy to include this in your scripts thanks to the free API they offer.
Here is how to make a REST query:

http://query.yahooapis.com/v1/public/yql?q=select * from search.termextract where context in (select content from html where url="http://en.wikipedia.org/wiki/Black_Friday_(1945)") | unique(field="Result") | sort(field="Result")

respective:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20search.termextract%20where%20context%20in%20(select%20content%20from%20html%20where%20url%3D%22http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FBlack_Friday_%25281945%2529%22)

Kommentar verfassen

Deine E-Mail-Adresse wird nicht veröffentlicht.