Sunday, November 26, 2017

Inter Partes Review is Improving the Patent System

Today (Monday, November 27), the Supreme Court is hearing a case, Oil States Energy Services, LLC v. Greene’s Energy Group, LLC, that seeks to end a newish  procedure called inter partes review (IPR). The arguments in Oil States will likely focus on arcane constitutional principles and crusty precedents from the Privy Council of England; go read the SCOTUSblog overview if that sort of thing interests you. Whatever the arguments, if the Court decides against IPR proceedings, it will be a big win for patent trolls, so it's worth understanding what these proceedings are and how they are changing the patent system. I've testified as an expert witness in some IPR proceedings, so I've had a front row seat for this battle for technology and innovation.

A bit of background: the inter partes review was introduced by the "America Invents Act" of 2011,  which was the first major update of the US patent system since the dawn of the internet. To understand how it works, you first have to understand some of the existing patent system's perverse incentives.

When an inventor brings an idea to a patent attorney, the attorney will draft a set of "claims" describing the invention. The claims are worded as broadly as possible, often using incomprehensable language. If the invention was a clever shelving system for color-coded magazines, the invention might be titled "System and apparatus for optical wavelength keyed information retrieval". This makes it difficult for the patent examiner to find "prior art" that would render the idea unpatentable. The broad language is designed to prevent a copycat from evading the core patent claims via trivial modifications.

The examination proceeds like this: The patent examiner typically rejects the broadest claims, citing some prior art. The inventor's attorney then narrows the patent claims to exclude prior art cited by the examiner, and the process repeats itself until the patent office runs out of objections. The inventor ends up with a patent, the attorney runs up the billable hours, and the examiner has whittled the patent down to something reasonable.

As technology has become more complicated and the number of patents has increased, this examination process breaks down. Patents with very broad claims slip through, often because the addition of the internet means that prior art was either un-patented or unrecognized because of obsolete terminology. These bad patents are bought up by "non-practicing entities" or "patent trolls" who extort royalty payments from companies unwilling or unable to challenge the patents. The old system for challenging patents didn't allow the challengers to participate in the reexamination. So the patent system needed a better way to correct the inevitable mistakes in patent issuance.

In an inter partes review, the challenger participates in the challenge. The first step in drafting a petition is proposing a "claim construction". For example. if the patent claims "an alphanumeric database key allowing the retrieval of information-package subject indications", the challenger might "construct" the claim as "a call number in a library catalog", and point out that call numbers in library catalogs predated the patent by several decades. The patent owner might respond that the patent was never meant to cover call numbers in library catalog. (Ironically,  in an infringement suit, the same patent owner might have pointed to the broad language of the claim asserting that of course the patent applies to call numbers in library catalogs!) The administrative judge would then have the option of accepting the challenger's construction and open the claim to invalidation, or accepting the patent owner's construction, and letting the patent stand (but with the patent owner having agreed to a narrow claim construction!)
Disposition of IPR Petitions in the first 5 years. From USPTO.

In the 5 years that IPR proceedings have been available, 1,153 patents have been completely invalidated and 287 others have had some claims cancelled. 331 patents that have been challenged have been found to be completely valid. (See this statistical summary.) This is a tiny percentage of patents; it's likely that only the worst patents have been challenged; in the same period, about one and a half million patents have been granted.

It was hoped that the IPR process would be more efficient and less costly than the old process; I don't know if this has been true but patent litigation is still very costly. At least in the cases I worked on had correct outcomes.

Some companies in the technology space have been using the IPR process to oppose the patent trolls. One notable effort has been Cloudflare's Project Jengo. Full disclosure: They sent me a T-shirt!


Update (November 28): Read Adam Liptak's news story about the argument at the New York Times
  • Apparently Justices Gorsuch and Roberts were worried about patent property being taken away by administrative proceedings. This seems odd to me, since in the case of bad patents, the initial grant of a patent amounts to a taking of property away from the public, including companies who rely on prior art to assure their right to use public property.
  • Some news stories are characterizing the IPR process as lopsided against patent owners. (Reuters: "In about 1,800 final decisions up to October, the agency’s patent board canceled all or part of a patent around 80 percent of the time.") Apparently the news media has difficulty with sampling bias - given the expense of an IPR filing, of course only the worst of the worst patents are being challenged; more than 99.9% of patents are untouched by challenges!


Sunday, October 29, 2017

Turning the page on ereader pagination

Why bother paginating an ebook? Modern websites encourage you to "keep on swiping" but if you talk to people who read ebooks, they rather like pages. I'll classify their reasons into "backward looking" and "practical".

Backward looking reasons that readers like pagination
  • pages evoke the experience of print books
  • a tap to turn a page is easier than swiping
Practical reasons that readers like pagination
  • pages divide reading into easier to deal with chunks
  • turning the page gives you a feeling of achievement
  • the thickness of the turned pages help the reader measure progress
Reasons that pagination sucks
  • sentences are chopped in half
  • paragraphs are chopped in half
  • figures and such are sundered from their context
  • footnotes are ... OMG footnotes!
How would you design a long-form reading experience for computer screens if you weren't tied to pagination? Despite the entrenchment of Amazon and iPhones, people haven't stopped taking fresh looks at the reading experience.

Taeyoon Choi and his collaborators at the School for Poetic Computation recently unveiled their "artistic intervention" into the experience of reading. (Choi and a partner founded the Manhattan-based school in 2013 to help artists learn and apply technology.) You can try it out at http://poeticcomputation.info/



On viewing the first chapter, you immediately see two visual cues that some artistry is afoot. On the right side, you see something that looks like a stack of pages. On the left is some conventional-looking text, and to its right is a some shrunken text. Click on the shrunken text to expand references for the now shrunken main text. This conception of long form text as existing in two streams seems much more elegant than the usual pop-up presentation of references and footnotes in ebook readers. Illustrations appear in both streams, and when you swipe one stream up or down, the other stream moves with it.

The experience of the poetic computation reader on a smartphone adapts to the smaller screen. One or other of the two streams is always off-screen, and little arrows, rather than shrunken images indicate the other's existence.

 * * *

On larger screens, something very odd happens when you swipe down a bit. You get to the end of the "page". And then it starts moving the WRONG way, sideways instead of up and down. Keep swiping, and you've advanced the page! The first time this happened, I found it really annoying. But then, it started to make sense. "Pages" in the Poetic Computation Reader are intentional, not random breaks imposed by the size of the readers screen and the selected typeface. The reader gets a sense of achievement, along with an indication of progress.

In retrospect, this is a completely obvious thing to do. In fact, authors have been inserting intentional breaks into books since forever. Typesetters call these breaks "asterisms" after the asterisks that are used to denote them. They look rather stupid in conventional ebooks. Turning asterisms into text-breaking animations is a really good idea. Go forth and implement them, ye ebook-folx!

On a smart phone, Poetic Computation Reader ignores the "page breaks" and omits the page edges. Perhaps a zoom animation and a thickened border would work.

Also, check out the super-slider on the right edge. Try to resist sliding it up and down a couple of times. You can't!

 * * *

Another interesting take on the reading experience is provided by Slate, the documentation software written by Robert Lord. On a desktop browser, Slate also presents text in parallel streams. The center stream can be thought of as the main text. On the left is the hierarchical outline (i.e. a table of contents), on the right is example code. I like the way you can scroll either the outline or the text stream and the other stream follows. The outline expands and contracts accordion-style as you scroll, resulting in effortless navigation. But Slate uses a responsive design framework, so on a smartphone, the side streams reconfigure into inline figures or slide-aways.

"Clojure by Example", generated by Slate.

There are no "pages" in Slate. Instead, the animated outline is always aware of where you are and indicates your progress. The outline is a small improvement on the static outline produced by documentation generators like Sphinx, but the difference in navigability and usability is huge.

As standardization and corporate hegemony seem to be ossifying digital reading experiences elsewhere,  independent experiments and projects like these give me hope that a next generation of ebooks will put some new wind in the sails of our digital reading journey.

Notes:
  1. The collaborators on the Poetic Computation Reader include Molly Kleiman, Shannon Mattern, Taeyoon Choi and HAWRAF. Also, these footnotes are awkward.


Monday, September 11, 2017

Prepare Now for Topical Storm Chrome 62

Sometime in October, probably the week of October 17th, version 62 of Google's Chrome web browser will be declared "stable". When that happens, users of Chrome will get their software updated to version 62 when they restart.

One of the small but important changes that will occur is that many websites that have not implemented HTTPS to secure their communications will be marked in a subtle way as "Not Secure". When such a website presents a web form, typing into the form will change the appearance of the website URL. Here's what it will look like:

Unfortunately, many libraries, and the vendors and publishers that serve them, have not yet implemented HTTPS, so many library users that type into search boxes will start seeing the words "Not Secure" and may be alarmed.

What's going to happen? Here's what I HOPE happens:
  • Libraries, Vendors, and Publishers that have been working on switching their websites for the past two years (because usually it's a lot more work than just pushing a button) are motivated to fix the last few problems, turn on their secure connections, and redirect all their web traffic through their secure servers before October 17.
          So instead of this:

           ... users will see this:

  • Library management and staff will be prepared to answer questions about the few remaining problems that occur. The internet is not a secure place, and Chrome's subtle indicator is just a reminder not to type in sensitive information, like passwords, personal names and identifiers, into "not secure" websites.
  • The "Not Secure" animation will be noticed by many users of libraries, vendors, and publishers that haven't devoted resources to securing their websites. The users will file helpful bug reports and the website providers will acknowledge their prior misjudgments and start to work carefully to do what needs to be done to protect their users.
  • Libraries, vendors, and publishers will work together to address many interactions and dependencies in their internet systems.


Here's what I FEAR might happen:
  • The words "Not Secure" will cause people in charge to think their organizations' websites "have been hacked". 
  • Publishing executives seeing the "Not Secure" label will order their IT staff to "DO SOMETHING" without the time or resources to do a proper job.
  • Library directors will demand that Chrome be replaced by Firefox on all library computers because of a "BUG in CHROME". (creating an even worse problem when Firefox follows suit in a few months!) 
  • Library staff will put up signs instructing patrons to "ignore security warnings" on the internet. Patrons will believe them.
Back here in the real world, libraries are under-resourced and struggling to keep things working. The industry in general has been well behind the curve of HTTPS adoption, needlessly putting many library users at risk. The complicated technical environment, including proxy servers, authentication systems, federated search, and link servers has made the job of switching to secure connections more difficult.

So here's my forecast of what WILL happen:
  • Many libraries, publishers and vendors, motivated by Chrome 62, will finish their switch-over projects before October 17. Users of library web services will have better security and privacy. (For example, I expect OCLC's WorldCat, shown above in secure and not secure versions, will be in this category.)
  • Many switch-over projects will be rushed, and staff throughout the industry, both technical and user-facing, will need to scramble and cooperate to report and fix many minor issues.
  • A few not-so-thoughtful voices will complain that this whole security and privacy fuss is overblown, and blame it on an evil Google conspiracy.

Here are some notes to help you prepare:
  1. I've been asked whether libraries need to update links in their catalog to use the secure version of resource links. Yes, but there's no need to rush. Website providers should be using HTTP redirects to force users into the secure connections, and should use HSTS headers to make sure that their future connections are secure from the start.
  2. Libraries using proxy servers MUST update their software to reasonably current versions, and update proxy settings to account for secure versions of provider services. In many cases this will require acquisition of a wildcard certificate for the proxy server.
  3.  I've had publishers and vendors complain to me that library customers have asked them to retain the option of insecure connections ... because reasons. Recently, I've seen reports on listservs that vendors are being asked to retain insecure server settings because the library "can't" update their obsolete and insecure proxy software. These libraries should be ashamed of themselves - their negligence is holding back progress for everyone and endangering library users. 
  4. Chrome 62 is expected to reach beta next week. You'll then be able to install it from the beta channel. (Currently, it's in the dev channel.) Even then, you may need to set the #mark-non-secure-as flag to see the new behavior. Once Chrome 62 is stable, you may still be able to disable the feature using this flag.
  5. A screen capture using chrome 62 now might help convince your manager, your IT department, or a vendor that a website really needs to be switched to HTTPS.
  6. Mixed content warnings are the result of embedding not-secure images, fonts, or scripts in a secure web page. A malicious actor can insert content or code in these elements, endangering the user. Much of the work in switching a large site from HTTP to HTTPS consists of finding and addressing mixed content issues.
  7. Google's Emily Schechter gives an excellent presentation on the transition to HTTPS, and how the Chrome UI is gradually changing to more accurately communicate to users that non-HTTPS sites may present risks: https://www.youtube.com/watch?v=GoXgl9r0Kjk&feature=youtu.be (discussion of Chrome 62 changes starts around 32:00)
  8. (added 9/15/2017) As an example of a company that's been working for a while on switching, Elsevier has informed its ScienceDirect customers that ScienceDirect will be switching to HTTPS in October. They have posted instructions for testing proxy configurations.






Monday, August 14, 2017

PubMed Lets Google Track User Searches

CT scan of a Mesothelioma patient.
CC BY-SA by Frank Gaillard
If you search on Google for "Best Mesothelioma Lawyer" and then click on one of the ads, Google can earn as much as a thousand dollars for your click. In general, Google can make a lot of money if it knows you're the type of user who's interested in rare types of cancer. So you might be surprised that Google gets to know everything you search for when you use PubMed, the search engine offered by the National Center for Biotechnology Information (NCBI), a service of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). Our tax dollars work really hard and return a lot of value at NCBI, but I was surprised to discover Google's advertising business is getting first crack at that value!

You may find this hard to believe, but you shouldn't take may word for it. Go and read the NLM Privacy Policy,  in particular the section on "Demographic and Interest Data"
On some portions of our website we have enabled Google Analytics and other third-party software (listed below), to provide aggregate demographic and interest data of our visitors. This information cannot be used to identify you as an individual. While these tools are used by some websites to serve advertisements, NLM only uses them to measure demographic data. NLM has no control over advertisements served on other websites.
DoubleClick: NLM uses DoubleClick to understand the characteristics and demographics of the people who visit NLM sites. Only NLM staff conducts analyses on the aggregated data from DoubleClick. No personally identifiable information is collected by DoubleClick from NLM websites. The DoubleClick Privacy Policy is available at https://www.google.com/intl/en/policies/privacy/
You can opt-out of receiving DoubleClick advertising at https://support.google.com/ads/answer/2662922?hl=en.
I will try to explain what this means and correct some of the misinformation it contains.

DoubleClick is Google's display advertising business. DoubleClick tracks users across websites using "cookies" to collect "demographic and interest information" about users. DoubleClick uses this information to improve its ad targeting. So for example, if a user's web browsing behavior suggests an interest in rare types of cancer, DoubleClick might show the user an ad about mesothelioma. All of this activity is fully disclosed in the DoubleClick Privacy Policy, which approximately 0% of PubMed's users have actually read. Despite what the NLM Privacy Policy says, you can't opt-out of receiving DoubleClick Advertising, you can only opt out of DoubleClick Ad Targeting. So instead of Mesothelioma ads, you'd probably be offered deals at Jet.com

It's interesting to note that before February 21 of this year, there was no mention of DoubleClick in the privacy policy (see the previous policy ). Despite the date, there's no reason to think that the new privacy policy is related to the change in administrations, as NIH Director Francis Collins was retained in his position by President Trump. More likely it's related to new leadership at NLM. In August of 2016, Dr. Patricia Flatley Brennan became NLM director. Dr. Brennan, a registered nurse and an engineer, has emphasized the role of data to the Library's mission. In an interview with the Washington Post, Brennan noted:
In the 21st century we’re moving into data as the basis. Instead of an experiment simply answering a question, it also generates a data set. We don’t have to repeat experiments to get more out of the data. This idea of moving from experiments to data has a lot of implications for the library of the future. Which is why I am not a librarian.
The "demographic and interest data" used by NLM is based on individual click data collected by Google Analytics. As I've previously written, Google Analytics  only tracks users across websites if the site-per-site tracker IDs can be connected to a global tracker ID like the ones used by DoubleClick. What NLM is allowing Google to do is to connect the Google Analytics user data to the DoubleClick user data. So Google's advertising business gets to use all the Google Analytics data, and the Analytics data provided to NLM can include all the DoubleClick "demographic and interest" data.

What information does Google receive when you do a search on Pubmed?
For every click or search, Google's servers receive:
  • your search term and result page URL
  • your DoubleClick user tracking ID
  • your referring page URL
  • your IP address
  • your browser software and operating system
While "only NLM staff conducts analyses on the aggregated data from DoubleClick", the DoubleClick tracking platform analyzes the unaggregated data from PubMed. And while it's true that "the demographic and interest data" of PubMed visitors cannot be used to identify them as  individuals, the data collected by the Google trackers can trivially be used to identify as individuals any PubMed users who have Google accounts. Last year, Google changed its privacy policy to allow it to associate users' personal information with activity on sites like PubMed.
"Depending on your account settings, your activity on other sites and apps may be associated with your personal information in order to improve Google’s services and the ads delivered by Google.
So the bottom line is that Google's stated policies allow Google to associate a user's activity on PubMed with their personal information. We don't know if Google makes use of PubMed activity or if the data is saved at all, but NLM's privacy policy is misleading at best on this fact.

Does this matter? I have written that commercial medical journals deploy intense advertising trackers on their websites, far in excess of what NLM is doing. "Everybody" does it. And  we know that agencies of the US government spend billions of dollars sifting through web browsing data looking for terrorists, so why should NLM be any different? So what if Google gets a peek at PubMed user activity - they see such a huge amount of user data that PubMed is probably not even noticeable.

Google has done some interesting things with search data. For example, the "Google Flu Trends" and "Google Dengue Trends" projects studied patterns of searches for illness - related terms. Google could use the PubMed Searches for similar investigations into health provider searches.

The puzzling thing about NLM's data surrender is the paltry benefit it returns. While Google gets un-aggregated, personally identifiable data, all NLM gets is some demographic and interest data about their users. Does NLM really want to better know the age, gender, and education level of PubMed users??? Turning on the privacy features of Google Analytics (i.e. NOT turning on DoubleClick) has a minimal impact on the usefulness of the usage data it provides.

Lines need to be drawn somewhere. If Google gets to use PubMed click data for its advertising, what comes next? Will researchers be examined as terror suspects if they read about nerve toxins or anthrax? Or perhaps inquiries into abortifactants or gender-related hormone therapies will be become politically suspect. Perhaps someone will want a list of people looking for literature on genetically modified crops, or gun deaths, or vaccines? Libraries should not be going there.

So let's draw the line at advertising trackers in PubMed. PubMed is not something owned by a publishing company,  PubMed belongs to all of us. PubMed has been a technology leader worthy of emulation by libraries around the world. They should be setting an example. If you agree with me that NLM should stop letting Google track PubMed Users, let Dr. Brennan know (politely, of course.)

Notes:
  1. You may wonder if the US government has a policy about using third party services like Google Analytics and DoubleClick. Yes, there is a policy, and NLM appears to be pretty much in compliance with that policy.
  2. You might also wonder if Google has a special agreement for use of its services on US government websites. It does, but that agreement doesn't amend privacy policies. And yes, the person signing that policy for Google subsequently became the third CTO of the United States.
  3.  I recently presented a webinar which covered the basics of advertising in digital libraries in the National Network of Libraries of Medicine [NNLM] "Kernal of Knowledge" series.
  4. (8/16) Yes, this blog is served by Google. So if you start getting ads for privacy plug-ins...
  5. (8/16) urlscan.io is a tool you can use to see what goes on under the cover when you search on PubMed. Tip from Gary Price.

Monday, July 10, 2017

Creative Works *Ascend* into the Public Domain


It's a Wonderful Life, the movie, became a public domain work in 1975 when its copyright registration was not renewed. It had been a disappointment at the box office, but became a perennial favorite in the 80s as television stations began to play it (and play it again, and again) at Christmas time, partly because it was inexpensive content. Alas, copyright for the story it was based on, The Greatest Gift by Philip Van Doren Stern, HAD been renewed, and the movie was thus a derivative work on which royalties could be collected. In 1993, the owners of the story began to cash in on the film's popularity by enforcing their copyright on the story.

I learned about the resurrection of Wonderful Life from a talk by Krista Cox, Director of Public Policy Initiatives for ARL (Association of Research Libraries) during June's ALA Annual Conference. But I was struck by the way she described the movie's entry into the public domain. She said that it "fell into the public domain". I'd heard that phrase used before, and maybe used it myself. But why "fall"? Is the public domain somehow lower than the purgatory of being forgotten but locked into the service of a copyright owner? I don't think so. I think that when a work enters the public domain, it's fitting to say that it "ascends" into the public domain.

If you're still fighting this image in your head, consider this example: what happens when a copyright owner releases a poem from the chains of intellectual property? Does the poem drop to the floor, like a jug of milk? Or does it float into the sky, seen by everyone far and wide, and so hard to recapture?

It is a sad quirk of the current copyright regime that the life cycle of a creative work is yoked to the death of its creator. That seems wrong to me. Wouldn't it be better use the creator's birth date? We could then celebrate an author's birthday by giving their books the wings of an angel. Wouldn't that be a wonderful life?