Author Archives: Dorai Thodla


What is it?

It is a service for tracking web pages for changes. You can sign up at the InfoMinder site for a free 30 day trial.

How Do People Use it?

Here are several ways people use InfoMinder.

  1. Lead Generation – By tracking sites that publish RFPs (request for proposals) for equiment, services, grants and filtering them using keywords, people obtain leads. Other lead sources include job pages, job sites, question sites etc.
  2. Competitive Intelligence – By tracking competitor’s pages people increase their awareness on new product introductions, new partnerships, customer wins and new news items. It is much more efficient to do this using a service like InfoMinder than manually checking pages frequently.
  3. Marketing – Several PR companies use InfoMinder to track news propagation, coverage and news clipping services.
  4. Sales – Sales people use InfoMinder for keeping informed about their customers. An event in a customer company may become a topic of conversation.
  5. Tracking Industry – Many companies use InfoMinder to keep track of developments in their own industry. They do this by tracking pages on portals specific to an industry.
  6. Aggregation of Information – Many government departments use InfoMinder to obtain and aggregate information specific to the departments.
  7. Legal Research – Several legal professionals use InfoMinder to track trademark, copyright violations, legal events.
  8. Compliance Information  – Several industries use InfoMinder to track sites that provide information about compliance.
  9. Corporate Research – Many corporate librarians use InfoMinder for product research, product technology information.
  10. Internet Research – Since InfoMinder can track your bookmarks easily, anything you were interested enough to bookmark can be tracked easily

At iMorph, we have tools for Finding, Tracking and Mining Information. If you have interest in this space or have questions, feel free to leave a comment on this page. We will be happy to answer any questions.

ScienceLog: An Eye On The Universe

From AMON: An Eye on the Universe

AMON stands for Astrophysical Multimessenger Observatory Network. Its mission is to form a network of high-energy observatories across the globe that will search for previously unseen astrophysical signals and send alerts to more traditional telescopes in order to corroborate the possible celestial events.

Until the early 20th Century, astronomers relied almost exclusively on visible light to view the sky. Their telescopes, though steadily increasing inpower, were no different in this respect from the ones used by Galileo in 1610. Today we see much more of the universe by observing light from all across the electromagnetic spectrum. Gamma-ray-, x-ray-, infrared-, and radio-astronomy have revolutionized astronomical observation, as have the advent of space-based telescopes to complement those on the ground.

the past 50 years have seen tremendous progress in the sensitivity of instruments to detect cosmic rays—high-energy charged particles from outer space, such as protons and charged nuclei. Particle accelerators have enabled physicists to create, detect, and analyze other sub-atomic particles, such as neutrinos. These alternative messengers—particles that survive across vast distances in space—presented whole new avenues of exploration.

Computing infrastructure

The Research Computing and Cyberinfrastructure (RCC) unit of Information Technology Services enable scholars to do large-scale computations through linked services, including hardware, software, and personnel.

The High Performance Computing (HPC) system within RCC is a shared resource among dozens of researchers in a host of departmental and interdisciplinary units at Penn State that meets the dual data challenges presented by the AMON project. First, there is the need to continuously receive data from the triggering instruments. This requires computing systems with robust and consistently high “up-time.” The HPC has sub-systems rated at Tier III, with 99.999 percent up-time (less than five minutes of downtime annually).

Probabilistic Databases

“We are experimenting with a ‘probabilistic’ database that can collect disparate data, say, on neutrinos and gamma rays, and quickly determine the probability that both have come from the same source. This is cutting edge database work.”


It is fascinating to read about how computers and new scientific software and databases help advanced research.  Part of my random reading. Some of the most interesting Science Research articles come from tracking NSF (National Science Foundation) site.


One of the best ways to look at advances in Science and Technology is to track funded research.

After That, What?

Several years ago I was describing our product, InfoMinder to  a friend. I told him how cool it is, to have something that tracks websites, and alerts you when they change. He patiently listened to my pitch and asked a simple question:

After that, what?

I did not understand it, at first. I asked him to elaborate. He asked “what do your customers do after they get these alerts?”. Frankly, I had not thought of that. I knew what I would do, but I had no idea what our customers did.  Fortunately, it was easy to fix that problem. We just asked a bunch of them.

The moral of the story is that you need to think about what people do after they consume your product. You may be missing some opportunities for follow on products or providing a better solution.

Let me paint a few sample scenarios, for you.

  • You set up Google alerts. What do you do after you receive alert email?
  • You search for a company or topic and get a bunch of results from Google. What do you do, after that?
  • You locate an address using Google maps on your computer. What do you do next?
  • A friend texts you an address for a meeting. Now you have a smart phone with a good online map support. What do you do next?

The answer most of the time  is  – “it depends”.

I think there are some opportunities for some nifty tools to help the ‘after that, what’ problem. As we switch more and more to mobile devices to help us cope with our life, more such tools will be useful.

“After that what?’ is a good question to ask yourself, if you are looking for mobile product ideas. If you don’t want to do that, you can always ask me. I have a bunch of those problems.

Freemium Model – Our Experience

Here is some notes from the Freemium session at Google I/O. I wanted to annotate it with our experience with InfoMinder.

Why would you offer something for free?

To give the user time to learn about the product

User can provide distribution benefit (inviting other users to the product)

Network benefit – adding value in some way

The scale of conversions is in a 1-10% range.

In our case, we sent email message to about 20 friends and had a network of about 3000 users in 6 months. We provided the product free for a year. Then we started charging for a professional version but kept a basic version free for some time. Our conversion rate, initially was slightly more than 10%. We had a much bigger payoff when two companies OEMed our product.

In the past no business app, except Intuit, has penetrated the 10 employee or less market.

The model is going after new segments and new opportunities with a scale that can make it work.

Direct vs. indirect revenue models.

Feedburner – free version and $5/month version with some extra features.

Our basic version is still less than $2.5/month.

Product segmentation of free versus paid side

Viral/growth oriented things should be on the free side

Things that engage users in deeper behavior should be on the paid side

Can be difficult to draw the line within a company of what should be free vs. paid

The pricing was a bit difficult for us. We initially priced it at $14.99 per year. Gradually moved it up to $30/year. Our current products range from $30-$5000 per year based on the configuration. We have a lot more customers in the mid range $10/month than in others.

Start from the beginning with the model you hope to put in place – put your business model in beta at the same time you put your product into beta

This is easier said than done for two reasons. One is that product evolves and also your knowledge of customers. It took us a couple of years to find the sweet spots.

If the free product is so good that there’s no reason to pay, then an alternative is to limit capacity. Where do you draw the line? On the selection of features or on the capacity? – find your fanatical users and have them help you segment where the paywall should be

This is exactly what we did. Our trial product is almost as good as our full product. We charge by capacity. Others like WordPress do the same.

Establishing the value of your product is probably more important than establishing a pricing model right away.

This is very true. We continue to get business and keep customers even though there are some free products that provide some of the functionality. One of our customers told us that after trying out a few competitive products, they decided to stay with us due to a combination of quality and support.

Your instinct about what your customer will/will not pay for is likely wrong. Be flexible early in your business to be able to listen to your user feedback. Have the right premise – if you need 100M people to use your product and it’s not viral, it probably won’t work, for example.

Question: Freemium seems to be when dealing with the direct consumer. What is the balance between different models?

In mobile gaming space — collecting pennies per user over a lot of users can make a big difference. A mix of revenue types (direct/indirect) can work

Conversion rates will probably be between 2% and 5%, realistically. Most people won’t pay you for features (you may think they’re more valuable than the users do). Make people feel like they’re getting more value than just additional stuff than they’re already users.

Go ahead and read the wave. It has lots of information. Please feel free to ask us questions. You can email me or contact me directly (my co-ordinates are in the About link).

Visualizing User Categories

This is how I did in a few steps:

1. Extracted the titles from the list of registered users’ profiles

2. Edited the titles to remove certain keywords (for example  Director, Manager, VP)

3. Copied the text and pasted it in Wordle tool using the Create option

4. Randomized the display till I found the one I am happy with

5. Used Snag it to capture the tag cloud and save it as an image

It is kind of cool. I plan to use it in the new version of iMorph website.  Here are a few more things I would have loved to have:

1. I had counts of titles (a weight) but there is no way to pass wordle the info

2. I would have liked each word to be a hyperlink (to the list of titles).

I am exploring a few more tag cloud generation tools and see whether we can mashup some data with the clouds.

Why I did this:

I was doing this as a marketing exercise to try and find “the ideal user”. When you build generic tools like InfoMinder, you tend to have a wide variety of users. But it is interesting to find these patterns from your user base which provides a sense of direction for product enhancements as well as new products. But most of all, it provides clues on who are your potential channel partners. Typically they are the same ones who sell to your users.

Discovering Relevant Sources of Information

Discovering relevant sources of information is a recursive process. Let me explain.

Let us say that you want to track clean tech. The easiest way to find a list of sources is to type “cleantech” in your favorite search engine and look at top 20 distinctly different sources.

But that is just the first step. When you look at these you will find several interesting patterns. You may find portals about cleantech. You may find a directory of resources. You may find some popular bloggers or authors. The list goes on. Based on what you see, you can spawn more searches like these:

  • cleantech directory
  • cleantech products
  • cleantech vendors
  • cleantech lists

You can also find several related terms (or even ontologies) and include them in searches. An example would be “clean tech” OR “green tech” OR “renewable energy” etc.

The next step is to take each one of these sources and validate them. That is a bit more difficult. You may want to ask yourself the following questions:

  • How current are they?
  • How frequently do they update information?
  • Are they aggregators?
  • Do they support ads? (is there a correlation between their articles and company mentions with their ads)
  • Are they industry associations or industry publications?
  • Can you detect any biases?

In the end, you come up with a list of valuable sources. This provides a starting point. You can continuously monitor these sources using tools like InfoMinder and TopicMinder. In addition you can go a level deeper and find what their sources are and start tracking those sources as well.

You may want your own relevance ranking system. The search engines ranking may not really work for you. For example, if you are tracking an industry for early signals, highest page rank of the site may be completely irrelevant to your needs. For example my ranking criteria for a certain topic would be:

  • Some kind of source rank (which Google does well)
  • Currency (How current is the information?)
  • Authority (are the authors/columnists have a large following? Are they retweeted, blogged, linked to? Do they have high Social ranking like Klout scores/LinkedIn connections)
  • Is this their area of research? A topic cloud created from their recent columns and posts can give you some indication.

Discovery is recursive and a continuous process. If the information is that important (some thing you may need to act upon) this additional investment in validation and customization may be worth the effort.


Updated on Nov 12, 2012

InfoTools Survey Results

Yesterday, I gave a talk on InfoTools: Beyond Search at TiE Chennai. The slides of the presention are here. I think it went well, but I think, if I had cut down the slides and talk and gave more demos, it would have gone even better. Perhaps next time.

Before starting the talk, I requested people to give me (written) answers to three questions:

  1. What are your information needs?
  2. What are your problems with information?
  3. What tools do you use to manage information?

The questions, perhaps were a bit vague. I realized that after going through the answers. They varied in their level of granularity (specific vs generic problems) and the definition of information itself. But here they are (slightly modified to reduce redundancy).

Here is what I got from the survey:

What are your information needs?

  1. Potential customer info
  2. Right info at the right time (whenever I need it)
  3. To structure, unstructured web data
  4. Technology and Process to manage large newspaper portal
  5. Should be current, relevant (to my context. Should lead to (help?) actual decisions
  6. I need reference (information) of various consultants relating to start of business viz cost, web, management etc.
  7. Need for current information
  8. Need (to handle?) information from multiple sources and formats
  9. Collating information from multiple sources
  10. Information about competition
  11. About marketability and segments
  12. Company address information
  13. Company finance (annual reports)
  14. Executives within the company
  15. Trade details, products, services etc.
  16. Sales leads
  17. Knowledge enhancement
  18. Learning about old friends/acquaintances/family
  19. To learn to grow personally & business
  20. Needs to be local search for providers near to me (for ex: a photo copier shop near to my house)
  21. Technical solutions (day to day) for career and personal growth
  22. My business is providing information based services, package with recommendations. So need for information varies.
  23. Various technologies in market
  24. Information about market situation
  25. About stock/companies performance
  26. Details/support to solve issues
  27. Products available in the market for specifics(?)
  28. Focused News
  29. Similar business entity info
  30. Public info of competitor
  31. At a business level – market feelers about demand, ease of vendor options availability
  32. At an execution/implementation level – latent trends in tech
  33. Updated knowledge
  34. Price information about products etc.
  35. Information about technology
  36. Looking for acquiring an IT company. Need info on the industry they are in (macro) and more about that company (micro)
  37. Collect, compile for pattern understanding, plan for target customer
  38. Top IT temp staffing companies in India
  39. Total temp staff in IT in India
  40. How do I know the customer needs
  41. Scholarly articles on business entrepreneurship
  42. Product information, addresses from www
  43. Collecting/harvesting data from websites and collating, cleansing and delivering to clients
  44. Where is the resource for information?
  45. Where info is available, how to get data stream into our database
  46. How cost effective, credible, valuable is the data
  47. Accessibility
  48. About companies wanting to enter India -setup operations, joint ventures
  49. Companies in India wanting to enter other geographies
  50. Consultants from outside India needing partners in India
  51. Relevant, accurate data (specific to the task at hand)
  52. Info about prospective customers
  53. Info about vendors
  54. Info about current market
  55. Info about latest technology

Here are my list of information requirements (I took the survey along with others)

  1. Leads
  2. Trends
  3. Best practices

What are your problems with information?

  1. Locating the right data at the right time
  2. At times info overload
  3. Unable to get the right (specific) information
  4. Sometimes get caught into loads of data, making it difficult to sift through
  5. Credibility, cost and accessibility
  6. Frequent website updates
  7. Different formats of information
  8. Gettting data from complex templates and grouping into finite categories
  9. Precision, very difficult to get objective information
  10. Currency of data
  11. Comprehensiveness of data
  12. Need continuous monitoring
  13. Information overload and in such case, synthesizing & assimilating that information in a reasonable time frame is difficult
  14. Old data, not accurate
  15. Too much info
  16. Not easily accessible
  17. Irrelevant info
  18. Filter out the actual/real info from a large pool of junk data
  19. Do not have a scope to interact with peers in similar industries
  20. Direct actionable information takes several searches, navigation
  21. How to localize information (assume how to get local information) and get reliable info
  22. How to segregate info from the web
  23. Difficult to put together
  24. If put together, not sure whether it is the updated info
  25. If updated (up to date?) not sure about the integrity of the data source
  26. Availability (sources), Reliability (sources)
  27. Aggregation of data in a presentable manner
  28. Too much information
  29. Unable to identify precise locations quickly
  30. Quality of inputs not high (always)
  31. Too large varied and different
  32. Formats (word, pdf, excel etc. ), hard copies, books, magazines
  33. Difficult to authenticate, collate and organize based on requirement
  34. I like websearch engines but I strongly believe that these search engines are at a nascent stage. I just don’t need a site coming up in my search because it is in wikipedia or yahoo
  35. Inappropriate not timely
  36. Have to go through lots of notes/documents/pages to get a single piece of information
  37. Validating the information
  38. Storing and organizing information
  39. Time
  40. Where to see (sources?)
  41. Not a centralized reporting
  42. Assimilation requires a lot of pre-formatting
  43. Effective and speed search by everyone not followed
  44. Not sure what to look for, where to look for and how to get it
  45. Vast, use software to target timely, quick, on realtime
  46. Not able to source the information in the web
  47. We develop products based on blogs and emails. This is not enough.
  48. Too much info
  49. Info with noise

My List

  1. Signal vs noise
  2. Reliability
  3. Authenticity

What tools do you use?

  1. Blog, forums
  2. Google, web search
  3. Search engines
  4. Reliable third parties
  5. Friends
  6. Regular expressions
  7. Use bookmarking tools like delicious, share with team
  8. Knowledge repositories (wikipedia
  9. Books (online/printed)
  10. Inhouse tools to capture through automation
  11. Infosource – www, infoanalysis – spreadsheets
  12. Search engines to identify information
  13. Customized perl/php/ programs to manage
  14. Scrape information from the web and manage it
  15. Search engines
  16. Networking sites (LinkedIn etc)
  17. Forums
  18. Email
  19. My brain power, word/excel
  20. justdial and few others provide localized service over phone but it is not so accurate
  21. Justdial
  22. Hakia
  23. None
  24. Excel/Computer/Notebooks
  25. Peer discussions
  26. IE Favorites (browser bookmarks)
  27. Bing
  28. Primary Research
  29. Internet, newspapers, meeting – software modules
  30. spreadsheet, email
  31. Internet, libraries
  32. Getting logic from other tools and using our own tools or languages
  33. Perl, regex
  34. Paid portals
  35. LinkedIn
  36. Spoke
  37. Ecademy
  38. Xing
  39. My memory (sigh)

What I use:

  1. Social bookmarks (delicious, stumble upon)
  2. Twitter Search
  3. Facebook groups
  4. LinkedIn Groups and Answers
  5. Custom search
  6. Blog/Feed Search
  7. Twine
  8. Semantic Search engines
  9. InfoMinder
  10. InfoStreams (feed aggregator/search)
  11. InfoPortals (just started)
  12. Tag clouds (generated)
  13. Concept Mapping tools
  14. OpenCalais
  15. Zemanta
  16. Wikis

This is a small sample (about 40+ people who attended my talk). But you can see some patterns. I think we have a long way to go beyond search.

Implementing an Innovation Process

I came across this nice blog on Innovation Process Framework, by Jeffrey Philips (via Innovation Weblog)


The blog is a nice read and tries to outline a framework for Repeatable Innovation. Towards the end Jeffrey appeals to the readers to provide feedback.

If you care to, please comment or provide your feedback. I think if we practitioners, consultants and interested bystanders can create a consistent vision for the future of innovation and the tools and processes necessary for success, we can help our clients and business partners become more successful.

I have been experimenting with a few tools and some ad-hoc processes for innovation (in small product groups). So let me start out with a few tools and see how we can start putting together, elements of this framework.

You can start with any simple content management system (Drupal, Plone, Dotnet Nuke or even a Wikimedia engine).  It is also possible to use commercial portal products like Sharepoint, BEA or IBM portal servers. Let us see how we can go about building a prototype of the tools required to bootstrap your Innovation Process based on the framwork described by Jeffrey.

1. Trend Spotting

You can use several products that exist in the marketplace to track trends. The tools I list here provide you information to detect trends. Here is a list.

  • Google Alerts- A service to receive alerts based on certain keywords
  • InfoMinder – Our product to track specific web pages for changes (you can optionally specify filters) and receive notification. Unlike Google or alerts, InfoMinder is specific to the pages you want to track.
  • DiggdeliciousTechmemereddit or any of your favorite social bookmarking service (you can look for specific trends or retrieve information using tags)
  • Technorati or Google Blog Search tools
  • Tag Clouds (many of the services mentioned above provide tag clouds that tell you the more popular trends) or you can create your own tag clouds.
  • Google Trends – A product from Google that allows you to see trends based on searches
  • A set of high level Text Mining and Tech mining tools ( a subject that deserves almost a blog of its own)

A combination of these services and other customer serivces, can be used to perform trend capture. You need to figure out a way to make sense of trends from these different pieces of information (Trend Spotting). Fortunately many of these tools provide RSS streams or APIs. You can easily integrate them with several content management systems.

2. Generate Ideas

You can set up a workflow where people with the role of Generators, look at the captured trend information, combine it with other sources and generate ideas. These can be either stored in any relational database like MySQL, Postgres SQL.

3. Capture additional Information
In the system, Ideas are just a specific type of document with certain metadata like creator, date of creation, source of idea, description etc. It will be nice to add the capability for anyone to tag ideas. Based on tags and other criteria, ideas can be routed to Evaluators.

4. Evaluate Ideas
The evaluators can add comments, additional tags, classify the ideas to be further researched and send them back into the system. With each iteration, the circle widens. Ideas are further validated, combined with others or split into multiple ideas and put back into the system. Since Ideas trigger ideas, this process of combining and splitting will work well.

5. Develop and Launch

Stakeholders are found, prototypes built, ideas developed and launched as products/services.Your content management system can be used as a record keeper in this phase. In every step of the process from ideation to launch, it may be worth engaging small communities of users. Connecting to social tools like Twitter, Facebook, LinkedIn may be a good way to build and grow these communities.

6. Workflow/Process Automation

This is functionality built into several content management systems. Ideas can move from one stage to another (nascent, researched, validated etc.)

7. Idea Archetypes

One of the important aspects of the design of Idea Archetype is the progressive addition of information. Some ideas are listed here:

  • State – specifies the current stage of the idea. As it goes through the system, the state of the idea keeps changing
  • Strength – an indicator of the strength of the idea. As ideas float through the system and gather support, the strength can be progressively increased. Support increases this value and opposition decreases this value.
  • Next Steps – For each idea there can be a sequence of steps which can be started by the creator of the idea and collaboratively edited by others. For example, the legal department may add a patent search as a next step

8. Process Maps

Argument mapsConcept maps and other mapping tools can be loosely integrated (most of them export data in XML, JSON or CSV  format).

9. IdeaLogs

Ideas can also be published in blogs (private if they are meant for a small internal groups). Many portal products or content mangement systems come with their own blog software. You can also integrate some of the popular blogging software like WordPress.

10. Wikis as Collaborative Knowledge Bases

Wikis can be used as a knowledge bases to share, collaboratively edit and archive ideas. Wikis are alternative to idea archetypes,  mentioned earlier. Many of the wikis now provide templates for creating structured pages.

Any portal framework that supports content management, custom content types, workflow, collaboration, authentication can be used to jump start the Innovation Process in an organization. It is easy to bootstrap an innovation process using this framework and existing tools in a few weeks.

The best approach is to start with something as simple as a portal, set up some simple workflows, use a single page with extensible metadata as a basis for collaboration.


Pretty much everything I described here can be done using many other portal frameworks, as well. One of recent favorites is Drupal especially since it has started providing support for RDF ( core language  for the semantic web as well). You can also custom build this framework using web frameworks like Rails(Ruby), Django(Python).

Tech Mining

I have been a reading a book  called Tech Mining. I was planning to write a few blogs after finishing the book. But the whole purpose of my learn log is to (b)log as I learn. So here some information from the first couple of chapters. 

According to the authors, various types of Technology Analyses can be aided by tech mining.

1. Technology Monitoring(also known as technology watch or environmental scanning) – cataloguing, characterizing, and interpreting technology development activities.

2. Competitive Technology Intelligence(CTI) – finding out “Who is doing what?”

3. Technology Forecasting – anticipating possible future development path for particular technologies

4. Technology Roadmapping – tracking evolutionary steps in related technologies and, sometimes, product families.

5. Technology Assessment – anticipating the possible, unintended, indirect, and delayed consequences of particular technology changes.

6. Technology Foresight – startegic planning(especially national) with emphasis on technology roles and priorities

7. Technology Process Managment – getting people making decisions about technology

8. Science and Technology Indicators – time series that track advances in national (or other) technological capabilities. 

We do a bit of the first activity with our product InfoMinder, but have a long way to go in provide the other capabilities mentioned above. We do plan to help customers set up Information Portals to store the tracked information and do some automatic linking. 

Applied XML: A new Webservice API for Books

It is interesting to track new APIs and mashups at Programmable Web. I think it is one of the most useful resources on the web. Using these APIs you can build your own little applications called mashups.

Today I came across one that shows an interesting trend developing. It is a programming interface for accessing books, subjects, publishers and authors. The site lists some impressive statistics:

Statistics (03/01/2006):

Associated with the API are a few xml vocabularies.

An XML vocabulary is useful for exchanging data in a common format. When data is stored in XML format, it is more amenable to access and manipulation. As these formats spread in popularity, there is a likelyhood that more people will start using these formats. These are the baby steps in the making of a data web.