Tuesday, April 30, 2013

Quote of the Day: JFK on Religion and Public Office

"But let me stress again that these are my views. For contrary to common newspaper usage, I am not the Catholic candidate for president. I am the Democratic Party's candidate for president, who happens also to be a Catholic. I do not speak for my church on public matters, and the church does not speak for me.

Whatever issue may come before me as president — on birth control, divorce, censorship, gambling or any other subject — I will make my decision in accordance with these views, in accordance with what my conscience tells me to be the national interest, and without regard to outside religious pressures or dictates. And no power or threat of punishment could cause me to decide otherwise.

But if the time should ever come — and I do not concede any conflict to be even remotely possible — when my office would require me to either violate my conscience or violate the national interest, then I would resign the office; and I hope any conscientious public servant would do the same.

But I do not intend to apologize for these views to my critics of either Catholic or Protestant faith, nor do I intend to disavow either my views or my church in order to win this election.

If I should lose on the real issues, I shall return to my seat in the Senate, satisfied that I had tried my best and was fairly judged. But if this election is decided on the basis that 40 million Americans lost their chance of being president on the day they were baptized, then it is the whole nation that will be the loser — in the eyes of Catholics and non-Catholics around the world, in the eyes of history, and in the eyes of our own people."

— John F. Kennedy, in a speech to the Greater Houston Ministerial Association on September 12, 1960

Thursday, April 25, 2013

Post-Redirect-Get Pattern in PHP

As an alternative to writing a final paper for my Information Storage and Retrieval class (too easy!), I've been working on my first database-driven web site. I will link the site when finished, but today I'm presenting the web programming tutorial I wish I would have had a few days ago.

Optimal audience: PHP programmers who want to accept input from web users without risking duplicate input when users refresh their browsers or click 'Back' arrows.

Gettin' and Postin'

Web browsers send more than just the URL when they transmit requests to web servers; a verb is bundled in as well. Usually, this verb is GET. It means: Hey, web server, show me what you have at http://www.timecube.com/

or...

http://en.wikipedia.org/wiki/Cat

or...

https://maps.google.com/maps?q=denver,+co&hl=en&ll=39.751545,-104.985352&spn=0.459278,0.883026&sll=37.0625,-95.677068&sspn=30.323858,56.513672&hnear=Denver,+Colorado&t=m&z=10

Whether simple or complex, GET is about retrieving content from a web server. Best practice is for GET requests to be free of side effects. In particular, GET should not be used to update a web site's database, because programs like Googlebot try to GET everything they can; this can lead to nightmare scenarios for databases affected by GET requests.

POST is another verb (or request method), except this one is about sending content to a web server. If you use any online forums, think of GET as what you use to read posts and POST as what you use to post posts.

The Trouble With Posting

If a user requests the same URL two times or ten times using GET, the only downside is some extra bandwidth usage. Requesting the same URL multiple times with POST can mean sending duplicate information to the web server. This is why some websites warn against extra clicking, or refreshing, or navigating with Back and Forward buttons. It can lead to duplicate forum posts, duplicate user registrations, or duplicate purchases. Not good!

Diagram courtesy of Quilokos.

The fix is to start with a POST to send information to the web server, but end up on a GET: a safe, no-surprises GET. So instead of immediately receiving a confirmation page in response to a POST, the web client receives a redirect response which, in turn, causes the web client to issue a GET to see the confirmation page. (Yes, this is a bit convoluted.)

Diagram courtesy of Quilokos.

This general fix is called the "Post Redirect Get" pattern (or PRG pattern). What tripped me up was how to implement the PRG pattern in PHP. I found parts of a solution here and there, but not a (relatively) simple example all in one place.

A (Relatively) Simple Example All In One Place

Create a file named "echochamber.php" and paste in the following contents (minus the line numbers):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<?php
    session_start();

    $echoedShout = "";

    if(count($_POST) > 0) {
        $_SESSION['shout'] = $_POST['shout'];

        header("HTTP/1.1 303 See Other");
        header("Location: http://$_SERVER[HTTP_HOST]/echochamber.php");
        die();
    }
    else if (isset($_SESSION['shout'])){
        $echoedShout = $_SESSION['shout'];

        /*
            Put database-affecting code here.
        */

        session_unset();
        session_destroy();
    }
?>

<!DOCTYPE html>
<html>
<head><title>PRG Pattern Demonstration</title>

<body>
    <?php echo "<p>$echoedShout</p>" ?>
    <form action="echochamber.php" method="POST">
        <input type="text" name="shout" value="" />
    </form>
</body>
</html>

Note: Unlike pure HTML/CSS/Javascript test sites, you can't just save this file on your local computer and open files with a web browser. It needs to go on a web server configured for PHP parsing. You can either go through the hassle of configuring your local computer, or use a hosting service (I like nearlyfreespeech.net).

Oh, You Want An Explanation?

PRG can be done with three separate files: the form to fill out, a file that processes filled out forms and gives a redirect response, and a final result page that is the target of redirection. But it's often convenient for the fill-in page and the final-result page to be the same or very similar. Why not stuff everything into one file? At any rate, I'm taking the all-in-one approach in this example. It's less intuitive, but not too bad.

First Load

Suppose a user navigates to http://www.prg-in-php-example.gov/echochamber.php (or whichever domain you're using). The two 'if' statements on lines 6 and 13 will fail. In fact, the big PHP section does nothing significant besides initializing the $echoedShout variable to an empty string. The HTML section is rendered as a simple text input box:


Data Entry

This hypothetical user is a Poe fan, so she types in "Lenore" and hits Enter. Lines 31 and 32 take this input and construct a POST request that includes a variable called "shout" with the contents "Lenore". This POST request is sent back to the web server's echochamber.php file (which happens to the same file in this case). Execution starts again from the top.

On this second time around, the $_POST superglobal tested on line 6 has some content, i.e. the "shout" variable and its associated content "Lenore". Ignore line 7 for just a moment. Lines 9 and 10 respond to the POST request with redirect headers. The user's web browser will receive the redirect headers and start a new GET request for echochamber.php.

Problem! How will this GET request differ from the original GET request? After all, it's not redirecting users to "/echochamber.php?shout=Lenore" or anything that obvious.

The secret sauce is the $_SESSION superglobal. It provides a temporary holding place for this user's data. Line 7 puts the contents of "shout" that came in $_POST into "shout" in $_SESSION so that "Lenore" can survive a trip through a fresh GET. The same principle can work for ten, twenty, or more variables.

Data Display, Finally

Third time around. Second GET. $_SESSION is loaded up.

$echoedShout is once again initialized to an empty string, but won't stay that way for long. This is a GET, so the 'if' statement on line 6 will fail. Line 13's 'if' will succeed because $_SESSION is holding a value for "shout". That value is copied to $echoedShout and then the HTML renders:


Two Ways to Go Wrong

Is all of this complexity really necessary? For instance, why bother with lines 20 and 21's functions session_unset() and session_destroy()? The difference is what happens when a user refreshes a page showing "Lenore" over the blank field.
With session-killer functions: "Lenore" vanishes, and the user sees the original page with a blank field alone and no hidden state in $_POST or $_SESSION.
Without these functions: "Lenore" remains. Any code between lines 13 and 19 will run again with the same $_SESSION values. This can cause duplicate database entry on account of $_SESSION, even if the PRG pattern is preventing duplicate entry on account of POST. 
What happens if we really simplify and leave out the PRG pattern entirely? In other words, what if "echochamber.php" were only:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<!DOCTYPE html>
<html>
<head><title>PRG Pattern Demonstration</title>

<body>
    <?php echo $_POST['shout'] ?>
    <form action="echochamber.php" method="POST">
        <input type="text" name="shout" value="" />
    </form>
</body>
</html>

At first, it might seem like everything is hunky-dorey, but hit Refresh and you'll see a warning like this:


In other words, refreshing will send a POST request with the same information used earlier (even without using $_SESSION), followed by the loss of color in everything good in the world, culminating in the user being arrested for sharing that one MP3 back in college. Web developers shouldn't subject their users (or databases) to such risks.

Sunday, April 14, 2013

Big Data and the Future of Collections Management

[A presentation I gave for Collections Management class. You can follow the clicks in this embedded Prezi to simulate the experience.]



"Big data" is a big buzzword in business and technology circles. If some had asked me a year ago to define big data, I would have talked about credit card and credit score companies. I would have talked about Google harvesting email content. I would have talked about social networking graphs.

[click]

But this sharp increase in data collection is just the first step. The soul of big data is in its use. [click] And the magic of big data is that its use is [click] not predetermined.

To see what I mean, let's take a minute to think about scientific method. [click] Remember this from grade school?
  • Form a hypothesis.
  • Design an experiment.
  • Then: Collect data.
  • Analyze data.
  • Draw a conclusion.
Big data allows a different approach. [click] Data collection happens first. And here's the key: this data is more or less complete. There's no need to design an experiment to generate some data that's relevant to a specific hypothesis; we just work from all the data! This is a data driven approach to finding patterns, including patterns we never would have hypothesized.

For example [click], when Walmart's analysts searched their sales history for interesting patterns, they found a connection between [click] looming hurricanes and the sale of [click] flashlights! Ok, that's not too surprising. They also found a strong correlation between hurricanes and [click] pop tarts! Who knew? Even individual pop tart purchasers may not have perceived they're part of a pattern; a wider perspective was required. So Walmart did the obvious thing, they waited for a hurricane and shipped truckloads of extra pop tarts to select stores. They sold like hotcakes! Or should I say, like pop tarts before a hurricane? [click]

The authors of the book on which this talk is based wrote:
"Big data refers to things one can do at a large scale that cannot be done at a smaller one." p. 6
Let's see what another retail giant has accomplished with large scale data. [click]

Target sometimes advertises by sending 'targeted' coupons to individual customers. It's like Amazon.com's personalized recommendations. Of course, the better the match between coupons and customer needs, the higher the chance that people will get in their cars, drive to Target, and buy things!

Here's the creepy part. Target's analysts wanted to know if they could identify pregnant customers. So they started with customers who had registered for baby showers and searched for patterns in their purchase histories. It turns out that customers who purchase cotton balls and unscented lotion are more likely to be pregnant, especially if this is followed up by certain vitamins and minerals or over twenty other pregnancy-correlated items. In fact, this progression of purchases can even produce a projected due date! There's even a story about a father who came in to Target upset because his teenage daughter had received coupons for baby cribs. Target knew before he did!

If big data is sounding powerful and a little scary, you've got the right idea. [click]

Now we're ready to talk about big data in the context of public library collections management. [click]

What's the difference between a library and a book store? One difference is that book stores are ultimately about making money, while libraries are ultimately about serving their patrons. As we saw with Target, big data can be used to trade away privacy for profit. It seems inevitable that retail stores will use big data in more and more invasive ways [click]. If this happens, all libraries need to do is maintain their reputation for privacy and their value will grow. It would even make sense for collection development policies to mention a preference for materials of confidential interest.[click]

On the other hand, libraries are very well situated to take advantage of big data techniques. Unlike Walmart or Kmart transactions, every checkout is tied to a loyalty card...I mean a library card. I can't tell you what patterns a team of big data analysts would reveal in library data. But when we find our equivalent to pop-tarts or unscented lotion, we might order more or fewer of certain materials, rearrange items, or set up displays at more effective times.[click] [click]

Obviously, there's some tension between maintaining privacy and using library data to its fullest. We could add a line to due date phone calls: "This is Lincoln Public Libraries. We are calling to inform you that you have an item due on Thursday... and you might also enjoy Surprise Child: Finding Hope in Unexpected Pregnancy!" Yes, that might scare people away. Thankfully, libraries don't need to rely on their own data to take a big data approach. [click]

We can use public data. Even without big data analysis, individual collection managers can (and should!) follow best seller lists, social networking trends, and top news stories. Big data analysis goes deeper. It might be possible to predict the next big things before they make their way to the top. Libraries could be ready to meet demands for the next 50 Shades, not lag weeks behind retail stores. If a historically-themed movie is coming out, it would make sense to review materials on that subject, but only if public interest really is picking up; big data might be able to tell the difference. [click]

In summary, big data is powerful and a little scary. It's not something for the average librarian to use directly, but I believe it is everyone's responsibility to steer the profession between the extremes of neglecting and overusing this technology. We need to adapt to big data, but we also need to adapt big data to our professional ethics.

Thank you. [click]

Wednesday, April 3, 2013

On "Filtering and the First Amendment"

Since Deborah Caldwell-Stone's American Libraries article "Filtering and the First Amendment" covers similar ground to my earlier essay "Public Forum Doctrine in U.S. v. American Library Association," I'd like to do some friendly nit-picking.

Quick Background

In the United States, public and school libraries are bribed (rather than coerced) into filtering Internet access for minors. This is done through CIPA, the Children's Internet Protection Act. In 2003, the constitutionality of CIPA was challenged but upheld in U.S. v. American Library Association.

Clarity

Caldwell-Stone's article is helpful because misconceptions about the requirements of CIPA are indeed widespread:
"Often, it is because the institutions and individuals responsible for implementing these policies misunderstand or misinterpret CIPA and the Supreme Court decision upholding the law. Among these misunderstandings is a belief that an institution will lose all federal funding if it does not block all potentially inappropriate sites to the fullest extent practicable, or that the Supreme Court decision authorized mandatory filtering for adults and youths alike. Another mistaken belief is that it does not violate the First Amendment to impose restrictive filtering policies that deny adults full access to constitutionally protected materials online." (Caldwell-Stone, 2013)
I appreciate the way she raises awareness that CIPA policies aren't legal requirements and that no library's filtering has been judged too lax to qualify. If a library doesn't want to filter, they don't have to filter! If a library wants to filter lightly, they can still collect CIPA funds.

Not So Clear

My nit-picking concerns the last sentence of the quote above. Caldwell-Stone is correct that US v. ALA did not authorize mandatory filtering for adults, but the Supreme Court didn't forbid it either. Legally, it's an open question. Caldwell-Stone evidently feels strongly that such filtering violates the First Amendment (a very respectable position to have!), but it's easy for readers to be misled when legal facts and legal hopes are presented in parallel phrases.

This bit is also problematic:
"Does CIPA itself, or the 2003 Supreme Court opinion, actually authorize a library to limit an adult’s access to constitutionally protected speech? A close reading of the district court’s opinion reveals that it fails to address the Supreme Court’s directive: Libraries subject to CIPA should disable filters for adult users to assure their First Amendment rights." (Caldwell-Stone, 2013)
The Supreme Court gave no such "directive." There was no majority opinion (at all), and no such directive can be found in the plurality opinion. In fact, none of the six judges concurring in judgment said so. The Court's language is along these lines:
"Assuming that such erroneous blocking presents constitutional difficulties, any such concerns are dispelled by the ease with which patrons may have the filtering software disabled." (US v. ALA, Opinion of the Court)
Note the qualifier "assuming." The Court isn't taking a position on whether or not "such erroneous blocking presents constitutional difficulties." Suppose it were a problem for libraries to block constitutionally protected speech: easy disabling would be an antidote. Suppose it weren't a problem to block such speech: now it's an unnecessary antidote. Since this specific case didn't hinge on the constitutionality of "such erroneous blocking," the judges didn'tand couldn'trule on the issue.

Another concurring judge wrote:
"If some libraries do not have the capacity to unblock specific Web sites or to disable the filter or if it is shown that an adult user’s election to view constitutionally protected Internet material is burdened in some other substantial way, that would be the subject for an as-applied challenge, not the facial challenge made in this case." (US v. ALA, Kennedy's concurrence)
It's entirely reasonable to conclude that a library with mandatory filtering for adults might be judged as violating First Amendment rights, just as a state denying same-sex marriage licenses might be judged (very soon, one hopes) to be violating equal protection rights. Then again, either of these situations might be judged to be constitutional.

One last concurring judge:
"Perhaps local library rules or practices could further restrict the ability of patrons to obtain 'overblocked' Internet material. [...] But we are not now considering any such local practices. We here consider only a facial challenge to the Act itself." (US v. ALA, Breyer's concurrence)
Hopefully it's clear at this point that mandatory Internet filtering for adults is not clearly unconstitutional or constitutional. I applaud Caldwell-Stone for her explanations and her advocacy; I just wish she would separate the two a little more explicitly.


References

Caldwell-Stone, D. (April 2, 2013). Filtering and the first amendment. American Libraries. Retrieved from http://americanlibrariesmagazine.org/features/04022013/filtering-and-first-amendment

United States v. American Library Association, 539 U.S. 194 (2003).

Monday, April 1, 2013

Monthly Picks

I've been posting "Monthly Picks" on the first of each month since November 2011. For quite a while, the description at top read:
On the first day of each month, I will be posting about new papers I've found interesting in Philosophy or Library & Information Science. I'll try to make sure at least one is accessible to everyone.
In the last couple of months, I broke format and started including things that aren't journal articles. Now I'm completing the evolution: this space will be used primarily for web links, not journal articles. My blog as a whole has become more stereotypically blog style anyway. Heck, I used to do the whole superscript-number-referring-to-footnotes thing!

I hope readers enjoy the shift.

How to Teach Without Your Students Secretly Hating You. Things we all wish instructors were conscious about, but often are not. (LiveJournal still exists?!)

The Curious Case of Detached Value. My favorite post on Peter's blog, since it happens to sum up my views on moral tradition. Where is the Psych 101 students version of this experiment?

Regrets for my Old Dressing Gown, or A Warning to Those Who Have More Taste Than Fortune. A clever morality tale by the rogue encyclopedist of the French Enlightenment.

An Essential Skill for All Librarians. It's not HTML. If you're intrigued and want to get started, I recommend this list of underhanded techniques.

And an open access article that discusses what stands out to library hiring committees and what might make them roll their eyes:
Hodge, M., & Spoor, N. (2012). Congratulations! You've landed an interview: What do hiring committees really want?. New Library World, 113(3/4), 139-161.