Semantic Search ala Truevert
Interesting approach to semantic search. Information Week review of Truevert
Interesting approach to semantic search. Information Week review of Truevert
Powerset has an interesting new search engine that claims to go beyond free text search and understand the meaning of your queries. While this is a claim that has been made many times (Autonomy, InXight) I will watch Powerset's ability to handle a larger more unstructured dataset before becoming convinced. It would seem that for the time being they are focusing on the public web. It will also be interesting to see if they sell their technology to companies wanting a search tool capable of searching internal private documents. In my opinion this is where powerful contextual understanding engines are most sorely needed.
History may be Bunk but here is some bunk history. With the last name of Bunk I've been asked many times of the name got shortened on the boat over to the US. Well it turns out is did. My Great Grandparents Josef and Agnieszka Bak came over to America and at some point their name got changed to Bunk. Ellis Island has a great website where you can look up the records of passengers coming over. Below if the one for my great grandma.
Interesting to note on the manifest that she had not previously been in the US, so she was coming to join Josef at that time. His last name is also given in the list as Bak. I can find no record of his arrival, probably came before Ellis was constructed. She came from Komorow, which I believe is located west of Krakow in what was then Silesia (a part of Germany, since Poland didn't exist at that time). My 2nd cousin Jeff remembers that Mom (My great Grandma) said she remembered her father hating the Prussians (e.g. Germans?) which would make sense if they both came from somewhere in Silesia.
One of my favorite free tools dealing with visualization is Treemap
If you were to put together a data portal to constantly evaluate the trends of data in the agency this would be the type of thing that gives a nice overall snapshot of the data and the direction it is going. It also serves as a nice starting point for performing a specific analysis on a set of data. Here is a great example of what Treemaps can do.
The one thing to know about this example is they have taken the free concept and initial source code of Treemap and extended it beyond it's capability to be specifically geared to financial analysis. They charge a license fee if you want to use their extended version.
Traditional statistical/analytic techniques are provided for free by the R project. The R project is modeled after the S project and contains most of the same functionality. R can perform linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and present the results graphically.
The next few free tools are as much source code APIs as they are tools. They are powerful but difficult to use tools that require text and data mining theory to effectively use.
The first tool is Kea. It performs text key phrase extraction. Think of it as a way to figure out key concepts in unstructured text.
The next tool is Weka. Weka is by far the best free (open sourced) software package I have scene for data mining. It contains the majority of methods of data mining discussed in the workshop (data pre-processing, classification, regression, clustering, association rules, and visualization).
Here is an interesting article about an example of someone using Weka and Kea to mine, organize and analyze an internet mailing list's archives.
**Note its been translated from German so the wording is a bit off.
The first chapter of their results is available on line.
One more worth mentioning in the Free Tool Category is JFreeChart - free java class library for graphing
As you can see not all DMTA has to be expensive.
From Battelle Corportation (the people who created the CD) comes a software product that as they say in their website
couples advanced information modeling and management functionality with a visualization-oriented user interface
Continue reading "Starlight Information Visualization System" »