Tag information storage in a database
Posted: Sat May 15, 2010 11:31 am
Hi,
How is the tag information from Ctags stored in the CodeLite database? More specifically:
- Does CodeLite allow fuzzy or sub-string tag queries? e.g. "List tags containing the word "update" somewhere in their name"? A tag name column in a database is usually not appropriate for such queries, since an index cannot be used, so the database must iterate all records to execute this query.
- Some fields in the output of Ctags are complex. For example, the "inherits" extension is a comma-separated list of classes (e.g. in case of multiple inheritance). Does CodeLite provide a "class hierarchy" view? Searching for the hierarchy of some class needs to find the class name anywhere inside the "inherits" extension, which requires a query like the above (sub-string query on a column). I implemented this feature for my editor, jEdit, and these queries take a very long time to execute.
For this reason, I started to consider switching from a relational database to a Lucene index for the tag information. Lucene is a full text search engine, but it can be used for arbitrary textual information, not just text files. Lucene is originally a pure Java library, but there are ports in C and C++. The added value I expect to get from using Lucene is very quick substring / fuzzy searches that are very heavy with a relational database. In addition, the administration and maintenance associated with a Lucene index seems smaller than with a relational database.
BTW, some time ago I wanted to write a Lucene plugin for CodeLite. Still haven't neglected that plan, just didn't have time for it yet. But the purpose was different from the above - it wasn't for storing the tag information there, but instead to be able to quickly find the occurrences of each identifier in the source code (mostly to find references).
Thanks a lot!
Shlomy
How is the tag information from Ctags stored in the CodeLite database? More specifically:
- Does CodeLite allow fuzzy or sub-string tag queries? e.g. "List tags containing the word "update" somewhere in their name"? A tag name column in a database is usually not appropriate for such queries, since an index cannot be used, so the database must iterate all records to execute this query.
- Some fields in the output of Ctags are complex. For example, the "inherits" extension is a comma-separated list of classes (e.g. in case of multiple inheritance). Does CodeLite provide a "class hierarchy" view? Searching for the hierarchy of some class needs to find the class name anywhere inside the "inherits" extension, which requires a query like the above (sub-string query on a column). I implemented this feature for my editor, jEdit, and these queries take a very long time to execute.
For this reason, I started to consider switching from a relational database to a Lucene index for the tag information. Lucene is a full text search engine, but it can be used for arbitrary textual information, not just text files. Lucene is originally a pure Java library, but there are ports in C and C++. The added value I expect to get from using Lucene is very quick substring / fuzzy searches that are very heavy with a relational database. In addition, the administration and maintenance associated with a Lucene index seems smaller than with a relational database.
BTW, some time ago I wanted to write a Lucene plugin for CodeLite. Still haven't neglected that plan, just didn't have time for it yet. But the purpose was different from the above - it wasn't for storing the tag information there, but instead to be able to quickly find the occurrences of each identifier in the source code (mostly to find references).
Thanks a lot!
Shlomy