Update: Intelli/J IDEA SimpleSyntax Highlighter Plugin

I’ve made available the plugin through the IDEA Plugin Manager. It’s still at an early alpha stage. Basically still a prototype. But I find it quite useful already. Apart from Ruby highlighting I use it to highlight its own configuration file syntax as well as property files and some other work-related file types.

Just before uploading the latest version a few minutes ago I’ve spent some time on improving the performance. Probably the major problem of the plugin right now. If you’ve followed my recent posts on the IDEA plugin topic you may recall that I explicitly decided not to use a system like jflex. Well, by now I realize that this has certain implications. Mainly performance related.. :)

What I’m now thinking about is providing an alternative configuration syntax that can be used with jflex, but doesn’t use the complicated jflex syntax. And I’m thinking about compiling it at runtime using Groovy. Well, it’s still just a thought.

More importantly: I’m still waiting for the project to be approved at Tigris. I tried SourceForge first, but no response after five days.. For now the plugin sources are available at my IntensiCode site.

If you’re interested, check out the included TODO.txt to get an idea of what I have planned and/or what I’m dealing with.

And if you want to see some really awful stuff, have a look at my ConfigurableLanguage hack (in the src/net/intensicode/idea/core folder). Who’s to blame? Me because I’m trying to abuse the API? Or the JetBrain guys because they are trying to protect their application? (I’m guessing here.. I don’t really know why they choose to use global registries all over.. To be honest I probably would never use something like a global registry.. Just because global itself has such a bad smell.. together with registry it stinks..)

tfdj

Intelli/J IDEA SimpleSyntax Highlighter Plugin

Alright, this pet project became quite serious now and is consuming more and more evenings..

I’m now thinking about making it officially available. It’s become quite useful. It probably needs a small UI to make configuration easier. Apart from that it’s really nice to have proper highlighting for Ruby, etc.

Well, for now I simply attached the latest version of the plugin. If you should stumble upon this blog, please give this plugin a try and give me your feedback!

Extract the IDEA SimpleSyntax Plugin zip file into your IDEA config/plugins/ folder. After (re)starting IDEA, you should see two new entries in your Tools menu. Use the “Init SimpleSyntax” to install the default Ruby configuration in your IDEA settings. You can then use ‘Reload SimpleSyntax’ to activate Ruby syntax highlighting.

Update: Gizmo of kodierer.de fame pointed me to the syntax configuration of Adie. The functionality offered by the ‘configuration language’ is actually quite similiar. Very interesting to see their more formal approach using a configuration grammar..

Anyway, the really interesting question is this: Should IDEA provide a simpler way of configuring syntax highlighting? The XML filetypes based approach doesn’t work properly. (Or does it? And I’m just to stupid.. :) Implementing a full IDEA plugin just to get highlighting for a small (and/or own) little language seems to be overkill..

Update: I’ve made the plugin available through the IDEA Plugin Manager. Of course I found quite a few annoying problems after uploading it.. :) I’m working on fixes right now.. The important part is this: You should now be able to download the latest version of the plugin from inside IDEA.

tfdj

Update 4: Ruby Plugin for IDEA

I just added initial support for specifying syntax rules with small Ruby snippets. A lot of cleanup is required now to make the source code presentable again.. :)

Anyway, from now on you can specify something like this next to each other:

ruby  SPECIAL_QUOTED_STRING         => tags/SPECIAL_QUOTED_STRING.rb
regex SINGLE_QUOTED_STRING          => \'(?:[^\']|\\')*\'

The ruby rule takes a filename as ‘value’. In this filename you simply ‘def’ a ‘find_in’ function like this:

def find_in( input, start_offset, end_offset )
    start_index = input.index /%[qQ]./, start_offset
    return nil unless start_index

    delimiter = get_delimiter( input[ start_index + 2, 1 ] )
    return start_index, start_index + 4 if input[ start_index + 3, 1 ] == delimiter

    pattern = Regexp.new "[^\\\\]#{delimiter}"
    end_index = input.index pattern, start_index + 3
    return nil unless end_index

    return start_index, end_index + 2
end

I’m still waiting for SourceForge to approve this project. I’ll put everything online then asap..

tfdj

Update 3: Ruby Plugin for IDEA

Alright. Took me nearly three whole evenings to conquer IDEA. I’ve seen lots of little things I really dislike.. Globals and other BS.. Horrible..

Well. I got an initial configurable version of the “SimpleSyntax” highlighter plugin done. It’s available here as binary and source. These files will probably go away soon, after I created an official project somewhere. But some heavy cleaning up has to happen first..

Anyway, some of the nice bits of this plugin. Have a look at a configuration as it is supported right now:

[SimpleSyntax:V1.0]

Name: Ruby
Icon: simplesyntax_ruby.png
Description: Ruby Script File
ExampleCode: simplesyntax_ruby.rb

# Braces Configuration
Braces.Pairs: (),[]
Braces.Structural: {}

# Commenter Configuration
Comment.Line: #
Comment.BlockPrefix:
Comment.BlockSuffix:

# FileType Configuration
FileType.Icon: simplesyntax_ruby.png
FileType.Extensions: rb, ruby
FileType.DefaultExtension: rb

# Syntax Configuration
regex BLOCK_COMMENT                 => (?m)^(?:\s*#.*$){2,}
regex LINE_COMMENT                  => (?m)#.*$
(...)
regex SPECIAL_QUOTED_STRING         => %[qQ](.).*\1
regex SINGLE_QUOTED_STRING          => \'(?:[^\']|\\')*\'
regex DOUBLE_QUOTED_STRING          => \"(?:[^\"]|\\")*\"
(...)

# Element Descriptions
descriptions[ BLOCK_COMMENT ] = Block comment
descriptions[ LINE_COMMENT ] = Line comment
descriptions[ DOC_COMMENT ] = Documentation
(...)

# Element Default Attributes
attributes[ BLOCK_COMMENT ] = #303030,BOLD,ITALIC
attributes[ LINE_COMMENT ] = #305030,BOLD,ITALIC
attributes[ DOC_COMMENT ] = #503030,BOLD,ITALIC
(...)

I removed the dirty/incomplete parts.. :) Support for easier keyword specification, etc is still missing. For now only the “regex” rules are supported. Which is pretty clumsy for specifiying keywords..

Another “nice” thing I want to add are “ruby” and “groovy” rules in which some script code is executed to recognize tokens – instead of RegEx matching. We’ll see..

That’s it for today.. BF2 is waiting.. :)

tfdj

Update 2: Ruby Plugin for IDEA

I’ve been pretty busy with work all day. Now I got back to the IDEA Ruby plugin and after only a few minutes of thinking about what I was going to do, I said to myself “Wait a minute.. that’s BS..”. There’s a much simpler solution..

Alright, here’s the context: Yesterday I spent a few hours trying to use a ’search-replace’ approach for turning the JavaScript IDEA example plugin into a Ruby plugin. Didn’t work.. :) And after a few minutes playing around with JFlex.. well, it really was enough JFlexing for a lifetime.. I then started implementing the Lexer interface offered by IDEA again, keeping in mind that at some point I’d have JRuby doing the ‘tokenizing’.

But while doing that, I ’stumbled’ across a simpler idea. How about a concept called ‘RecognizedToken’, offering a simple API like this:


interface RecognizedToken
{
    boolean isFoundIn( CharSequence, StartOffset, EndOffset );
    int getTokenStart();
    int getTokenEnd();
    IElementType getTokenType();
}

Then it should be possible to implement a very simple Lexer checking all registered tokens on every ‘advance’ call. This Lexer could read a simple config file consisting of lines such as:


LINE_COMMENT => "#.*"
DOUBLE_QUOTED_STRING => "\"(?:[^\"]|\\\")*\""
(...)

Of course, the next problem is the complexity of the regular expressions. And performance could be an issue, if IDEA doesn’t properly cache the Lexer output. But judging from the ’start’ method signatures, my guess is they will cache everything..

Anyway, the Lexer now becomes a simple “10″ liner. Well, apart from the 100 LOC used to implement the generic API..


private final void updateTokenType( final int aStartOffset )
{
    myTokenStart = aStartOffset;
    myTokenEnd = myEndOffset;
    myTokenType = null;

    final RecognizedToken recognizedToken = myTokenFinder.findClosest
        ( myCharSequence, aStartOffset, myEndOffset );
    if ( recognizedToken == null ) return;

    myTokenType = recognizedToken.getTokenType();
    myTokenStart = recognizedToken.getTokenStart();
    myTokenEnd = recognizedToken.getTokenEnd();
}

The basic Lexer is working now. I’ll add configuration file support and then try some Ruby coding in IDEA with this plugin’s syntax highlighting enabled.

tfdj

Update 1: Ruby Plugin for IDEA

It’s been a busy week. Not much time for my fun projects.. Only today I started looking back into the Ruby plugin for IDEA. I see two ways that I’m willing to follow:

1. Embed JRuby into the Plugin and offer a Ruby-bases configuration file for the syntax definitions.

2. Clone the JavaScript example plugin and do a string-replace “JavaScript” to “Ruby”.

I actually like the first idea a lot more. And it’s much closer to my current solution, which is basically a hack using hardcoded lexer, keywords, etc.. I like the idea of having a simple config syntax to define the keywords and other parts of the languages that I want to hilight. Obviously there may be some performance issues..

The second idea, cloning the JavaScript thing, well.. it’s huge! And using JFelx for the lexer is nearly overkill.. imho.. :) I simply don’t like these ‘theoretically correct’ but not too pragmatic/usable solutions..

tfdj

BitStructEx

I’ve made available an early version of an alternative BitStruct “thing” on RubyForge:

http://rubyforge.org/projects/bit-struct-ex/

In contrast to the nice and more feature-complete implementation available here, I
focused on solving my main problem: Non-byte-boundary-aligned-nested-structures :)

There’s really a lot to like about Ruby.. I enjoy not coding in Java.. a welcome change..

However, the meta-programming parts can still be.. well.. “mind-bending”.. I’ll probably blog about the meta part used in BitStructEx. You basically jump between instance, class and metaclass – even though the methods are right next to each other.. feels weird sometimes.. more on this soon.. (Have a look at the struct_base.rb file if you’re interested.)

tfdj

Thoughts on JRuby

Just stumbled upon a – slightly weird – mentioning of JRuby on Lambda. It’s weird because JRuby has been around for a long time..

Anyway, I’m just thinking, maybe this could be a nice answer to my recent ‘performance’ problems mentioned here. Using the Java NIO memory-mapped files stuff in (J)Ruby..

Overall a very cool project.. imho..

tfdj

Extending Ruby: GPC

For the project mentioned in my previous post, I needed a way to ‘clip’ polygons. Polygons had to be intersected with a ‘bounding box/shape’ and they had to be triangulated.

I was faced with the decision to either implement one of the known algorithms, find a Ruby polygon clipper, use an external implementation, write a Ruby extension providing access to one of the available c/c++ implementations.

Performance is/was an issue. Geo data polygons can contain a large number of coordinates. And many polygons have to be processed.

I immediately dropped the idea of writing a pure Ruby implementation. Mainly because I’m new to Ruby and I don’t dig math at all.. :/

I found no Ruby implementation after a few hours of searching the web.

Then I started calling external implementations of ‘hgrd’, ‘gpc’ and some other polygon clippers via a primitive ‘exec’ call from Ruby. Obviously the performance was lousy. But I was able to deliver some first results quickly.

I also realized that there a huge differences in clipping quality. And I finally settled for the ‘GPC’ from Alan Murta: http://www.cs.man.ac.uk/~toby/alan/software/

After a few days I had to improve performance. So I started looking a ‘extending Ruby’.

I needed a few hours and a look at swig to finally ‘grok’ the way extending Ruby works. My first approaches – before swig – failed badly. I tried to map all ‘native’ objects’ to Ruby somehow, providing all functions somehow.. a total mess..

Looking at swig I was shocked by the ugly code. But at least I was able to derive some ideas. Interestingly enough, the swig code didn’t really work. It didn’t ‘understand’ the ‘pointers’ as ‘arrays’ used by the GPC. Or to be more precise: I didn’t understand how to tell swig what to do..

Anyway, after playing around with this code a few hours it finally clicked. It was then a matter of a few more hours to finish the extension using very simple – imho – code.

Here’s the code in case you’re interested:
http://www.berlinfactor.com/blogging/files/gpc_ruby.c

I should probably send this to Alan Murta. But for now it’s to ‘raw’. No index/error checking at all..

tfdj

Ruby and Reality

I was lucky enough to be able to choose Ruby for implementing a small server for ‘geo data’. Basically a service not unlike Google Maps. But much more simple and with vector output. The whole thing is used for providing data that can be used on a J2ME client.

Anyway, because of some weird problems inside the company, I had to read a 2 GB big file containing the geo data in a stupid textual format.

I started with a trivial ‘each_line’ approach. A ‘data entry’ in the file consisted of about 10 lines. The first 9 lines contained some attributes. The last line contained the coordinates of the represented geographic feature. So within the each_line block, i collected the data and attached a resulting ‘GeoObject’ instance to an output array whenever a data entry has been read completely.

This thing took about 4 hours to process the 2 GB.. wtf!? Alright, I didn’t think much when writing the code. And I only worked with Ruby for a few days. But some things were obvious to me: For example extracting all the ‘values’ out of a ‘line’ by extracting a string and calling ‘to_i’ is pretty inefficient.

So I started implementing a few straight forward optimizations: Instead of extracting substrings from the line I directly add the ‘bytes’ to determine the integer value. And instead of ‘each_line’ I read 16 MB chunks of data and worked with offset/index pairs on these chunks.

This improved performance by more than 50%. But still close to 2 hours.

Funny me, I fired up Intelli/J IDEA. This took about 2 minutes.. (I hate how bloated IDEA is by now.. I’d love to see an IDEA Light!) And I started hacking away a Java solution. Using IDEA this took me less than 30 minutes for this problem. I used the NIO features. With ‘getChannel’ and some ‘map’ call to do memory mapped IO my first version of this app took about 5 minutes to process 2 GB.

How’s that?

I can’t explain all of this huge performance difference.. A part might be the memory mapped IO. But look at this Java API. Scanning the ByteBuffer using these ‘get’ calls.. I assume the Java VM is a lot more powerful than what Ruby’s ‘foundation’ offers.

Anyway.. just a quick post on this topic. If I find the time I’ll post the code. Unfortunately I have to change some things to protect the innocent..

I’ll post another note on extending Ruby soon. I needed access to a polygon clipping library.. More on that soon..

tfdj

« Previous Page