PDF to byte array with TDD

At work, I was tasked with reviewing and prototyping a Windows Service that will monitor a file directory, and invoke a web service that utilizes FileNet to archive reports that will be FTPed to the directory in question.

 

The team that was writing the web service was had not completed the web service. However, they did provide documentation from a previous service. As I stared reviewing the available documentation with example code in VB.Net, I quickly noted, in my humble opinion, how verbose is VB.Net.

 

I really want to be agnostic, but did I say that I really think VB is too verbose i.e. too many words!! Anyway, here is the sample from a previous web service:

 

'Read the data from the file to be processed into a byte array

Dim fileToConvert As IO.FileInfo = New IO.FileInfo("C:\Temp\atiff.tiff")

Dim streamRead As IO.FileStream = fileToConvert.OpenRead

Dim b(0)() As Byte

'Size the array to hold the file

ReDim b(0)(CType(streamRead.Length - 1, Integer))

'Read the data into the array

streamRead.Read(b(0), 0, CType(streamRead.Length, Integer))

streamRead.Close()

 

'************* Commit the TIFFs to FileNet *************

'Commital requires a logged on user-- Get a user object from the security service

Dim secUser As Bank.Something.SecurityClient.User

secUser = New Bank.Somthing.SecurityClient.User("ID", "image", "CLON", Nothing, "http:/IPHERE:82/SecurityService.asmx")

 

'Make sure we were authenticated

If secUser.AuthenticationState = Bank.Somthing.SecurityClient.AuthState.Authenticated Then

                                'Call the committal webservice

                                Dim commit As New ImageCommittal.ImageCommittal

                                commit.CommitMultipageTIFFs(b, "CLON-IMAGE-DEV", "Banker", New String() {"CustomerNbr", "DocType"}, New String() {"458745", "18"}, secUser.Token))

Else

                                MsgBox("Not authenticated.")

End If

 

My first thought, based on the sample, was to get a file stream into a byte array. So in the spirit of keeping it simple, let’s get a file into a FileStream. But first, a failing test:

 

[Test]

public void GetPDFIntoFileStream(

{

      Assert.Fail("It's too early!!!");

}

 

 

Did I mention that I received the e-mail with the attached sample code first thing in the morning? That is why the Fail method input string was thus. OK, now that we have a failing test, lets get implementing:

[Test]

public void GetPDFIntoFileStream()

{

//    Assert.Fail("It's too early!!!");//was 7:10 AM

      System.IO.FileInfo fileToConvert = new System.IO.FileInfo("");

      System.IO.FileStream streamRead = fileToConvert.OpenRead();

 

      Assert.IsNotNull(fileToConvert);

      Assert.IsNotNull(streamRead);

}

 

Which of course gave me DeveloperTests.GeneralTestFixture.GetPDFIntoFileStream : System.ArgumentException : The path is not of a legal form.

 

I then found a PDF, copied and renamed it and then modified the test code:

System.IO.FileInfo fileToConvert = new    System.IO.FileInfo("c:\temp\testDoc.pdf");

 

Yet, because I forgot to escape the backslashes I got:

DeveloperTests.GeneralTestFixture.GetPDFIntoFileStream : System.IO.IOException : The filename, directory name, or volume label syntax is incorrect.

 

I added a verbatim-string-literal, “@”, to the FileInfo constructor and all is peachy:

[Test]

public void GetPDFIntoFileStream()

{

System.IO.FileInfo fileToConvert = new System.IO.FileInfo(@"c:\temp\testDoc.pdf");

System.IO.FileStream streamRead = fileToConvert.OpenRead();

Assert.IsNotNull(fileToConvert);

Assert.IsNotNull(streamRead);

}

 

 

So we now have “something” in the FileInfo and FileStream instance variables. Next, as I understand it, we have to get the stream into a byte array for the web service. So, I now assert that I want a jagged byte array, with the initial dimension containing the (second dimension of) array(s).

 

Assert.IsNotNull(fileArray[0]);

 

And I know it will not compile as I do not have the fileArray object:

I:\DotNetApps\tester\DeveloperTests\GeneralTestFixture.cs(78): The name 'fileArray[0]' does not exist in the class or namespace 'DeveloperTests.GeneralTestFixture'

 

But I now know what I have to do to pass, get a two dimensional byte array from the FileStream.

 

After figuring out how to initialize a jagged byte array, the test passed. Now the question becomes, does the byte array contain the file? So, how do I test for that? How about I test that the FileInfo object I get out of the byte array ( which I will call fileBack) is the same as the initial FileInfo object, fileToConvert?

 

Assert.AreEqual(fileBack,fileToConvert);

 

And of course, it fails because the fileBack object does not exist. But after I declare the fileBack object, I see that the FileInfo constructor does not take a FileStream type as an input parameter. More research is needed.

 

OK, I thought let’s look at the IO namespace and see what types we could use? Whala, the MemoryStream looks good. I can use it to at lease read the byte array and place into a “stream.”

 

System.IO.MemoryStream streamBack = new   System.IO.MemoryStream(fileArray[0]);

And though the MemoryStream and FileStream are two different types, they both expose the Length property. Let’s assert that in the test:

 

Assert.AreEqual(streamBack.Length,streamRead.Length - 1);

 

And we get the green bar.  We shall review all of what we have thus far:

 

[Test]

public void GetPDFIntoFileStream()

{

System.IO.FileInfo fileToConvert = new System.IO.FileInfo(@"c:\temp\testDoc.pdf");

System.IO.FileStream streamRead = fileToConvert.OpenRead();

 

Assert.IsNotNull(fileToConvert);

Assert.IsNotNull(streamRead);

 

//initialize the array

byte[][] fileArray = new byte[1][];

 

//size the arrays of the array

fileArray[0] = new byte[streamRead.Length - 1];

 

//read the data into the array

streamRead.Read(fileArray[0],0,(int)streamRead.Length - 1);

//streamRead.Close();

 

Assert.IsNotNull(fileArray[0]);

 

System.IO.MemoryStream streamBack = new System.IO.MemoryStream(fileArray[0]);

 

Assert.AreEqual(streamBack.Length,streamRead.Length - 1);

}

 

Upon further research I see that the FileStream object can create the file in a directory on the hard drive, and then write a byte array to the file. So I do not need the MemoryStream type after all. However, it was this brief learning exercise or “spike” as the agile camp likes to call it, that enabled me to find this out and think through the steps, avoiding unnecessary, lost time.

 

Okay then, how do we test our desired end result? How about creating the new file from the byte array of the old and testing that they are the same? Let’s assert something like:

 

//Assert that the files are the same

      Assert.AreEqual(fileStreamOld,fileStreamNew);

 

And as expected the test failed. Why do that when you know it will fail? Because I now know what it is that I need to do to pass the test.

 

Let’s start in small steps. First let’s assert that the new file gets created correctly:

 

Assert.IsNotNull(

new System.IO.FileInfo(@"c:\temp\newTestDoc.pdf"));

 

 

From here, we need to get this assert to pass. Let’s get the byte array back into a new FileStream object and then use the new FileStream object to create a copy of the old file to simulate the creating of the PDF file on the FileNet webservice end.

 

//create new filestream object, creating a new file on the

//hard drive

System.IO.FileStream fileStreamBack = new

System.IO.FileStream(@"c:\temp\newTestDoc.pdf",

System.IO.FileMode.Create,                     

System.IO.FileAccess.Write);

 

//write the byte array into the FileStream

object fileStreamBack.Write(fileArray[0],0,fileArray.Length);

 

When the previous assert is run, the tests pass. Of note, within the TDD methodology is the fact that I still know all the old test are passing. That is as I move forward making incremental changes. I then go to delete the new file from the directory in Window’s Explorer and I note that the new file size is only 1KB compared to 59KB of the original file. Then I note that the count input parameter, the third one in the FileStream.Write() method above is incorrect. It should be fileArray[0].Length and not fileArray.Length. Once that is corrected I can quickly run the test and get immediate feedback by looking at the new file size. And as expected, all looks good.

 

Finally, we will complete the implementation to get both the old file and the new file FileStream objects and assert that they are equal.

 

//now lets get the FileStreams of both files for testing

//get the old file FileStream

System.IO.FileInfo oldFile = new System.IO.FileInfo(@"c:\temp\testDoc.pdf");

System.IO.FileStream fileStreamOld = oldFile.OpenRead();

 

//get the new file FileStream

System.IO.FileInfo(@"c:\temp\newTestDoc.pdf");

System.IO.FileStream fileStreamNew = newFile.OpenRead();

 

//Assert that the files are the same

Assert.AreEqual(fileStreamOld,fileStreamNew);

 

I run the test and get an unexpected result: DeveloperTests.GeneralTestFixture.GetPDFIntoFileStream :

            expected:<System.IO.FileStream>

             but was:<System.IO.FileStream>

 

Of course, I overlooked the fact that they are both of the FileStream type but that some of the property values such as Name would be different. However, the Length (Gets the length in bytes of the stream) and Position (Gets or sets the current position of this stream) properties should be the same.

 

Let’s assert that:

//assert that the files are the same

Assert.AreEqual(fileStreamOld.Length,fileStreamNew.Length);

Assert.AreEqual(fileStreamOld.Position,fileStreamNew.Position);

//assert that the Name property is different

Assert.IsFalse(fileStreamOld.Name == fileStreamNew.Name);

 

And I learned something else new:

DeveloperTests.GeneralTestFixture.GetPDFIntoFileStream :

            expected:<59894>

             but was:<59893>

 

The old FileStream has one more byte that the new one. Then upon investigation you see that the FileStream.Length property value of the initial FileStream was subtracted by one when populating the byte array because arrays are zero indexed. We then used the byte array to write into the new FileStream object, resulting in a new FileStream object of 1 less byte in length that the old one. So, for a correct assert it should be:

Assert.AreEqual(fileStreamOld.Length - 1,fileStreamNew.Length);

 

When that is done, the green bar of life is displayed and all is in harmony and balance in the universe. Well, at least the tests pass and I have a better understanding of how to place a FileStream into a byte array.