![]() PDFParser signature expects File handle from a Java world. It initiates PDF reader, parses the PDF file, passes the document to our arbitrary class and returns the string it read.The trickiest part of it was that I was trying to pass the Ruby file handle to the PDFParser as an argument. String = pdfTextStripper.getText(pdDocument) PdfTextStripper = PDFLayoutTextStripper.new PdDocument = PDDocument.new(pdfParser.getDocument()) PdfParser = PDFParser.new(RandomAccessFile.new(Java::JavaIo::File.new(path), "r")) ![]() To execute the class, and run it on file located in given path I created a static method: def self.file_path_to_text(path) PDFTextStripper = Java::OrgApachePdfboxText::PDFTextStripper PDDocument = Java::OrgApachePdfboxPdmodel::PDDocument RandomAccessFile = Java::OrgApachePdfboxIo::RandomAccessFile PDFParser = Java::OrgApachePdfboxPdfparser::PDFParser Each Java class can be referenced in a Ruby way by going through Java module tree: Next thing I did, is that shortened namespaces of classes I use. PDFLayoutTextStripper = JavaUtilities.get_proxy_class("PDFLayoutTextStripper") Then, by using JRuby as a proxy, we can reference it: First, copy the Java class to the root directory of your gem. You should manually execute this command each time you modify Java class or change dependencies.Īnd finally, the magic bits. Probably - it's not the best practice, but I included the build file that executes the following command: But we still don't have the compiler in place. The Java compiler will automatically create it. In fact, there is no directory named classes in our project. It is the directory, where JVM is looking for the included libraries. $CLASSPATH << "#/././classes"Ĭlasspath, for those with background in Java, is pretty straightforward. The next important line is classpath definition: Of course, you have to download and put them in `jars` directory and distribute their compiled versions together with your gem. Those are dependencies of the introduced class. Require_relative "././jars/commons-logging-1.2.jar" The next thing is to require Java jars in a ruby way: To use Java classes (also Java stdlib, and even to reference the Java code directly), we have to require the java module. Let me walk you through it, line by line. ![]() The wrapper code will be residing in lib/pdf/textream.rb. To ensure that it will be executed only on JVM, you have to modify the pdf-textstream.gemspec file and set platform parameter: Sadly, because we will be using Java native code, our gem will be only JRuby compatible. ❯ bundle gem pdf-textstream # naming things is a second hardest thing in IT, right? To create a gem I went a standard way mentioned in Bundler guide: I mentioned a gem, right? But before we create a gem we need to ensure that we are using JRuby: The tutorial I found, assumed every Java class is namespaced by package name - and to be honest I didn't want to change the class signature. Packages in java world can be translated to modules in Ruby. One important thing that it's missing is package definition. This class is very standard (when it comes to Java world standards). So firstly, let me introduce The Java Class: PDFLayoutTextStripper. I started looking for the solution even deeper and found answers in different places on the web. However, each of them covered wrapping jar package, rather than single class. I started googling and found excellent tutorials on this topic. "Let's wrap it in JRuby gem!" - came to my mind. I was a Java developer once, but I wanted my project to still use Ruby. ![]() The class is extracting text from PDF while keeping the text structure. Lately, I've stumbled upon a Java class that was performing the exact task I had on my mind when starting to write my gem.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |