|
<< Previous 1
2 3
4 5
Now all we need to is translate the class file specification into a set of classes, which is our grammar:
public class ClassFile {
int magic_1;
short minorVersion_2;
short majorVersion_3;
// hack, special name -- see Parser.parse
short constant_pool_count_4;
CPEntry[] constantPool_5;
short accessFlags_6;
short thisClass_7;
short superClass_8;
short interfacesCount_9;
short[] interfaces_10;
short fieldCount_11;
FieldInfo[] fields_12;
short methodCount_13;
MethodInfo[] methods_14;
short attributesCount_15;
AttributeInfo[] attributes_16;
}
This would all work beautifully if it weren't for a few non-uniform details in the class file specification. The constant pool contains one less than the count in the file.
The reasoning is the first element represents null and is not included. This could have been accommodated in a more uniform manner, but the designers chose not to do so. Similarly, long and double types require two elements in the constant pool, something that is a source of regret and so noted in the virtual machine specification. Even still, with a couple minor hacks we can deal with it.
Also, you will notice the trailing numbers in the field names. That's used to
sort the fields, since field order is not guaranteed. Here's a
main() method to get things going:
public static void main(String[] args) throws Exception {
File file = new File("bin/com/madhu/picovm/Parser.class");
FileInputStream fis = new FileInputStream(file);
byte[] data = new byte[(int) file.length()];
fis.read(data);
fis.close();
Parser p = new Parser();
Object o = null;
long start = System.currentTimeMillis();
for (int i=0; i<100; i+=1) {
o = p.parse(new ByteArrayInputStream(data), ClassFile.class);
}
long time = System.currentTimeMillis() - start;
System.out.println(o);
System.out.println("Time: " + time + "ms");
}
That's it! We have a complete parser for a small, but powerful class of grammars which takes less than 100 lines of code! The grammar is easily translated from the spec, which is actually larger than the parser itself. As a comparison, a fully hand-coded parser with classes for each structure took me several days to complete and test. The parser above and the "grammar" classes took only a few hours. There are many more details in the class file, such as
Code attributes, that are easily accommodated. More importantly, I can tackle other binary formats just by defining the grammar. The technique above can also be used in reverse to write a class file as well.
Some of you might wonder about performance. Historically, Reflection has been slow, so I made some timing measurements and it's not bad at all. On average, it takes about 16 ms to parse the 6 kb Parser class file itself
on a 1.5 GHz Pentium. Your mileage will probably vary. You might be able to do better with a hand-coded parser, but given a choice, I'll let the computer do the hard work!
Madhu Siddalingaiah is a consultant focusing on
modern technologies such as wireless, embedded and enterprise systems. He helps
organizations reach new markets and reduce costs through strategic use of
information technology. Madhu has worked with a number of high-profile clients
in many industries including health care, energy, aerospace, defense, and high
energy physics.
Madhu has authored several books, the latest titled "XML and
Web Services Unleashed". He is a popular presenter at technology conferences all
over the world.
<< Previous
1 2
3 4
5
|