Author Archives: Dorai Thodla

RSS Application: Monitoring Media

One of the best ways to keep track of the trends is to monitor media. Current Awareness tools (as they are often called) provide you a means of tracking what is going on in the web and staying current about your customers, partners and competitors. My company iMorph, has a product called InfoMinder which allows you to track changes on the web and RSS feeds. So when I found a link to A Guide To Media Monitoring With RSS, I was happy to spend some time looking at Josh Hallet’s suggestions.

On average I find that news and issues discussed in the blogosphere is at least 1-2 days ahead of the major media. In public relations, that one or two days can make a big difference.

In this comprehensive post, Josh has several suggestions about using RSS readers and social bookmarking tools including technorati, pub-sub, del.icio.us. Definitely a good resource if you are interested in tracking information on the web.

Found this via Weblogs Work

Web Data Extraction, XPath and XQuery

In his article on Semantic Screenscraping, Jon Udell talks about how to integrate information from the web using XPath as a tool for scraping web pages and integrating them with open APIs like the Google Map API.

Regular expressions once dominated my screenscraping code. Now XPath expressions do. Screenscraping is becoming more declarative, more query-like.

Jon outlines the developments that make obtaining data from the web easier.

1. HTML is readily covertible to XHTML

2. The resulting XHTML is semantically richer

3. XPath and XQuery are maturing to the point where they are very useful in extracting information
This topic is very interesting to me. About 5 years ago, we attempted a product called Information Integrator. The goal was to interactively step through web pages, mark portions that you are interested in, convert them to XML and integrate them into a single page. So your home page will be a set of transclusions. After a few attempts working with tidy and a mapper UI, we gave it up in favor of our current InfoMinder product. I think a variation of that idea still has some merit.

StratML – Strategy Markup Language

It is nice to know that the emerging technology group of the government is starting a community of practice in developing and using an XML vocabaulary. XML is being used heavily in government. The approach taken is to adopt industry standard vocabularies where available, and develop new ones as needed by the government.

“The StratML CoP will utilize CORE.gov to share ideas, information, meeting schedules and plans to develop a standardized vocabulary and XML template for Agency Strategic Plans.”

A link to the StratML Community Of Practice

A link to the presentation by  Adam J Schwartz

My Source: InfoMinder Alerts on xml.gov