I'm using some example code from
http://www.codeproject.com/csharp/DesktopSearch1.asp?df=100&forumid=190772&exp=0
&select=1153650
to parse PDF files for a DotLucene index. It uses the query.dll com
object's LoadIFilter method to return the correct IFilter for the file
being parsed (the PDF IFilter must be downloaded from adobe and
installed.
I want my index to be able to parse files from a database (via a Stream
object). There is a method for this as well, BindIFilterFromStream. I
am not an interop expert by any means - this is my first real attempt
at using it. So, I tried adding the method to my class like so:
[DllImport("query.dll", CharSet = CharSet.Unicode, SetLastError =
true)]
private extern static int BindIFilterFromStream (ref UCOMIStream pStm,
ref IUnknown pUnkOuter, [Out] out IFilter ppIUnk);
With the help of Mattias Sjongren I was able to create the UCOMIStream
and call the method:
private static IFilter loadIFilter(Stream stream)
{
IUnknown iunk = null;
IFilter filter = null;
// copy stream to byte array
byte[] b = new byte[stream.Length];
stream.Read(b, 0, b.Length);
// allocate space on the native heap
IntPtr nativePtr = Marshal.AllocHGlobal(b.Length);
// copy byte array to native heap
Marshal.Copy(b, 0, nativePtr, b.Length);
// Create a UCOMIStream from the allocated memory
UCOMIStream comStream;
CreateStreamOnHGlobal(nativePtr, true, out comStream);
// Try to load the corresponding IFilter
int resultLoad = BindIFilterFromStream( ref comStream, ref iunk, out
filter );
if (resultLoad == (int)IFilterReturnCodes.S_OK)
{
return filter;
}
else
{
throw new Exception("BindIFilterFromStream error: " +
Marshal.GetLastWin32Error());
}
}
This seems to work except that when the BindIFilterFromStream method is
called, it returns a result of -2147467259, and the
Marshal.GetLastWin32Error() returns 127 - "The specified procedure
could not be found." I am pretty sure that the method is being called,
because in prior testing I was able to get error messages about the
parameters I was trying to pass before getting help from Mattias. But
I don't know how to debug this further to see why I am not getting the
IFilter. I do know that if I serialize the file to the file system,
and use LoadIFilter, it works fine. So it's not the file, nor the COM
object.
Any ideas?? There is some more background on what I am doing on my
blog at www.refactory.net/blog
Robert Jordan - 15 Sep 2005 23:12 GMT
Hi Brian,
> [DllImport("query.dll", CharSet = CharSet.Unicode, SetLastError =
> true)]
> private extern static int BindIFilterFromStream (ref UCOMIStream pStm,
> ref IUnknown pUnkOuter, [Out] out IFilter ppIUnk);
Are you sure that the first 2 parameters are by ref?
The function I know about has this prototype:
private extern static
int BindIFilterFromStream(
UCOMIStream pStm,
IUnknown pUnkOuter,
[Out] out IFilter ppIUnk
);
> This seems to work except that when the BindIFilterFromStream method is
> called, it returns a result of -2147467259, and the
-214... is COM's E_FAIL (0x80004005). This a generic error.
Rob
Brian - 16 Sep 2005 01:27 GMT
Rob - first of all, I really appreciate the reply. I have been really
stuck on this, and not been able to find anyone that can help.
I tried your suggetion (I think I had actually already tried it and
just put "ref" on those variables as a last ditch attempt). I still
get the same error. Is there a better way to debug and/or get more
information about the error? I am very new to interop, but since I got
LoadIFilter working (granted, with someone else's code!) I thought I
should be able to get this working.
Robert Jordan - 16 Sep 2005 16:14 GMT
Hi Brian,
> Rob - first of all, I really appreciate the reply. I have been really
> stuck on this, and not been able to find anyone that can help.
Try to save the content of the PDF stream into a temp file and use
LoadIFilter from the codeproject.com sample.
If that works, I'd stick with it.
Rob
Brian - 17 Sep 2005 02:29 GMT
Ugh. I'm really trying to avoid having to do that. It's certainly my
fall back, I just don't understand why this isn't working. It's
becoming a personal vendetta! This system could end up indexing
several hundred pdf files, and it would be a bit of a performance
concern to have to pull all of them down to the file system to index
them. It just seems like if one method on that object can work, why
can't this one? But I have no interop experience so I don't know how
to better debug what's going on.
Robert Jordan - 17 Sep 2005 12:41 GMT
> [...] It just seems like if one method on that object can work, why
> can't this one? But I have no interop experience so I don't know how
> to better debug what's going on.
I didn't tell me whether LoadIFilter was working for you.
From the interop side I don't see any problems with
BindIFilterFromStream. The IStream convertion is correct.
So if LoadIFilter works but BindIFilterFromStream doesn't,
I'd say this is a problem with Adobe's Filter. It
probably doesn't support BindIFilterFromStream.
Rob
Brian - 17 Sep 2005 15:03 GMT
> So if LoadIFilter works but BindIFilterFromStream doesn't,
> I'd say this is a problem with Adobe's Filter. It
> probably doesn't support BindIFilterFromStream.
Oh, I'm sorry, I asked this question a few places, and I assumed I
mentioned LoadIFilter was working. I guess I could try it on word docs
and see if it can work from there. That would really stink, if Adobe's
IFilter doesn't support it, but it doesn't seem like a lot of people
are using that technique.
I just assumed though, that under the covers the Indexing Service COM
object was finding and returning the correct IFilter the same way
whether it was reading from a file or a stream.
Thanks for all your help!!
Brian - 19 Sep 2005 20:09 GMT
Update - I tested this on Microsoft Word documents too, and I get the
same error when trying to BindIFilterFromStream on a word doc. So,
unless the word IFilter also doesn't support this, I'm leaning toward
still thinking it's a problem with the COM object or more likely, with
Interop and/or my use of it...
Brian - 26 Sep 2005 21:10 GMT
I thought I'd follow up on this. Robert Jordan was correct. I got
some great help from Carlo McWhirter at Microsoft, and confirmed that
the Adobe IFilter object does not support the IPersistStream interface,
therefore making it impossible to extract text as a stream. Looks like
I am stuck with writing the streams to disk or using another component
to extract text.