US20120042236A1 - Integrated document viewer - Google Patents

Integrated document viewer Download PDF

Info

Publication number
US20120042236A1
US20120042236A1 US13/278,176 US201113278176A US2012042236A1 US 20120042236 A1 US20120042236 A1 US 20120042236A1 US 201113278176 A US201113278176 A US 201113278176A US 2012042236 A1 US2012042236 A1 US 2012042236A1
Authority
US
United States
Prior art keywords
document
text
html
web page
font
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/278,176
Inventor
John Adler, III
Jared Friedman
Matthias Kramm
Michael Lewis
Matthew Riley
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Scribd Inc
Original Assignee
Scribd Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Scribd Inc filed Critical Scribd Inc
Priority to US13/278,176 priority Critical patent/US20120042236A1/en
Priority to US13/343,695 priority patent/US8707164B2/en
Publication of US20120042236A1 publication Critical patent/US20120042236A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation

Definitions

  • This application relates generally to the integration of documents into web pages, and in particular to systems and techniques for preserving a document's original nature and appearance when displaying the document within the pages of a website, and automatically sharing users' reading-related activities on that website across their external social networks.
  • documents themselves have evolved well beyond traditional text, to include various different static and interactive media and page layout attributes, and to appear in many different forms, ranging from short emails or blog posts to book previews, news articles and creative writing samples, to long novels or reference books, and almost anything in between.
  • a document typically created via a program known as a “web browser” or as a more traditional page-oriented document (i.e., a document that is inherently divided into pages corresponding to static “printable” pages)
  • the author intends for the document to be printed or displayed on a computer monitor with a particular desired appearance.
  • a document's appearance includes a variety of presentation and page layout characteristics, such as the position, size and orientation of various component text, graphic and other static and interactive objects on each page of the document. It should be noted that the nature or functionality of these object types also is generally intended to be preserved, particularly when displayed on a computer monitor.
  • Maintaining a document's appearance as it is distributed among different computers and platforms has long been a problem addressed by various software technologies. For example, if a document is created with a particular word processing program and transferred to another computer which does not have access to that program, then the document may not even be accessible on the destination computer, or may only be accessible via another program that displays the document with a modified appearance (e.g., with different fonts or other formatting attributes).
  • PDF portable document format
  • Adobe Systems, Inc One of the leading solutions to this problem, even pre-dating the Web, is the “portable document format” (PDF) created by Adobe Systems, Inc.
  • PDF is designed to preserve fonts, as well as page layout and other object and document formatting characteristics, so that documents retain a virtually identical appearance when distributed across computers and platforms, displayed on a computer monitor or printed onto a physical medium, such as paper. For this reason, the PDF has become a widely adopted standard document format for printing and distributing documents across computers and platforms, regardless of which program the document's author used to create the document.
  • HTML web page
  • Both can contain various media types, from static text and graphics to animation, video and other interactive objects and functionality, such as hyperlinks, buttons and other controls.
  • both can be printed as static pages on physical paper, even though HTML documents are not generally divided into distinct pages unless and until they are printed.
  • both can be converted into PDF documents so as to retain their intended appearance when printed or distributed among different computers and platforms.
  • PDF documents have been difficult to integrate into web pages, while preserving their intended appearance, due to historical formatting limitations of the HTML format, which traditionally has allowed for the display of only a limited number of fonts.
  • HTML format which traditionally has allowed for the display of only a limited number of fonts.
  • Adobe and others have created programs that display existing PDF documents within a web browser's window. Yet, these programs cause the document to occupy the entire web browser window (along with the controls typically associated with Adobe's “Acrobat” program for displaying PDF documents).
  • the PDF document may appear within a web browser's window, it is not truly integrated into another web page; instead it becomes a distinct “web page” of its own.
  • the author of a web page cannot easily integrate an existing PDF document as part of a web page that includes other web elements or objects, such as text, images, advertisements, etc.
  • the “zoom” level and controls of the PDF document are distinct from those of the web page, often forcing the user to zoom the PDF document to a desired level for reading, but switch to a “global” zoom level to read the other components of the web page (text, images, ads, etc), and then reset the zoom level of the PDF document to continue reading (often while repeatedly readjusting the scrolling positions of the PDF document and the overall web page).
  • the PDF document becomes a separately controllable object that is subservient to the primary web browser controls for the overall web page window, resulting in significant inconvenience to the user.
  • PDF-to-HTML converters that enable the integration of the PDF document into a web page containing other component elements, but do so by sacrificing the original appearance of the document. For example, they convert the fonts embedded within the PDF document into the limited number of fonts typically made available to a computer's web browser. This approach defeats the primary objective of preserving the author's intended appearance of the PDF document.
  • Google has adopted a variation of this approach with its “Google PDF viewer,” which is integrated into its “Gmail,” “Google Docs” and other programs. While each page of a PDF document is still converted into an “image” under this approach, users can search for individual words within the document by virtue of Google's “thin client” approach, which relies upon frequent interaction between the user's web browser and a remote web server.
  • the user's web browser upon detecting that the user has attempted to select a word by clicking on the portion of the image containing that word, invokes the remote web server, which must parse the page of the PDF document to identify the “text” version of that word (e.g., the individual ASCII characters of the word), which can then be sent to the user's web browser, for example, to highlight the word or permit it to be copied and pasted elsewhere.
  • a user can search for words within the document by typing them into the user's web browser, which again must invoke the remote web server to conduct the search on the “text” within the PDF document, and then return the results to the user's web browser.
  • this “thin client” approach suffers from a number of disadvantages that result from converting the PDF document into an “image” rather than directly into text (along with the fonts that determine the appearance of that text).
  • the “image” of each page of the document is significantly larger than the corresponding text on that page (even apart from other non-text elements on the page), resulting in an additional delay before each page of the document can be delivered to and displayed by the user's web browser.
  • the frequent server interaction imposes further delays whenever the user interacts with the document, e.g., by scrolling to a new page or selecting or searching for words within the document.
  • the words of the document become distorted when zoomed (as would any bitmapped image of text), causing Google to include a custom “zoom” control to avoid this distortion, but at the expense of further delay due to additional server interaction.
  • users may also desire to share their reading-related activities (e.g., viewing, annotating, rating, uploading and downloading documents) with friends or other members of their social networks. Yet, actively choosing to share an activity or behavior is burdensome. For this reason, “passive sharing” is more desirable (i.e., setting predefined sharing preferences, with future behavior resulting in the automatic sharing of such behavior in accordance with those preferences).
  • Blippy a service offered via the website, www.blippy.com
  • “purchasing behavior” i.e., purchases made anywhere via a credit card, registered at the “Blippy” website
  • Blippy is designed with sharing as an integral component. Users already purchase items with their credit cards, and they already share their activities and behavior on their social networks with other members. Blippy simply connects the two, enabling the passive sharing of this existing external behavior (shopping) with users' existing social networks (e.g., Facebook friends).
  • Various embodiments of the current invention are disclosed herein, including techniques, apparatus, and systems for preserving a document's original nature and appearance when displaying the document within the pages of a website, and automatically sharing users' reading-related activities on that website across their external social networks.
  • HTML format While various iterations of the HTML format have included over time a feature allowing for the downloading of custom fonts (“web fonts”) that can be embedded into web pages, web fonts have been employed to enhance the authoring capabilities of HTML documents, rather than to facilitate the integration of PDF and other documents into web pages.
  • web fonts For example, the “@font-face” tag has been a component of the “Cascading Style Sheets” (CSS) specification for a number of years.
  • CSS CSS
  • the @font-face tag is employed in connection with the conversion of a PDF document into HTML to ensure the preservation of the original fonts embedded within that document. These fonts are downloaded and employed to generate the resulting HTML 5 document, which can then be integrated into any desired web page, as well as embedded into other web pages (e.g., by using the standard HTML “iframe” tag).
  • PDF in this embodiment
  • the original appearance of the source document PDF, in this embodiment
  • the text is preserved as searchable text
  • the document is integrated into a web page that can be searched, zoomed, scrolled, printed, etc., utilizing standard web browser controls.
  • the PDF document is now an integral component of the resulting HTML 5 web page, a significantly increased “ad inventory” is enabled. Advertisements can be integrated between the individual pages (or even within a page) of the document. Even in the context of a relatively short 20-page document, there is at least a 20-fold increase in the ad inventory than would be present if the document were confined to a separately scrolled window within the web browser's window.
  • the resulting document (independent of its format) can be passively shared with desired members of a reader's external social networks (as well as any social network within the host website), along with other reading-related activities and behavior performed by the reader on the website hosting the document.
  • a user sets predefined sharing preferences identifying particular social networks (e.g., Twitter, Facebook, MySpace, and the host website's social network) as well as specific activities and behavior on the website to be shared on those social networks (e.g., in this embodiment, which documents have been viewed, downloaded or uploaded, or even how many pages have been viewed, as well as annotations, ratings and various other behavior or extracted analytics).
  • any activities and behaviors within a website can be passively shared with a user's external social networks.
  • a user's reading-related activities within a host website are automatically shared with desired members of a user's social networks in accordance with the user's predefined sharing preferences. The user simply accesses the host website with the desire to read documents and perform other reading-related activities, with the result that such activities are automatically “passively shared” without any further action by the user.
  • FIG. 1 is a block diagram of one embodiment of the platform and key system components employed by the present invention, including user devices, host websites and key architectural components.
  • FIG. 2 a is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the preservation of fonts from the original document, as well as the integration of the document with other elements on the web page.
  • FIG. 2 b is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the preservation not only of fonts from the original document, but also the page layout of the original document across multiple pages.
  • FIG. 3 is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the preservation not only of fonts from the original document, but also searchable text displayed with its original fonts.
  • FIG. 4 a is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the insertion of advertisements between pages of the original document.
  • FIG. 4 b is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the insertion of advertisements in the “open space” within a page of the original document.
  • FIG. 5 is a flowchart illustrating a process of converting and integrating a document (e.g, a PDF document) into an existing HTML 5 web page in accordance with one embodiment of the present invention.
  • a document e.g, a PDF document
  • FIG. 6 is a screenshot of an initial “ReadCast” dialog box appearing next to a document displayed on a web page in one embodiment of the present invention, illustrating the initiation of the process of setting a user's “passive sharing” preferences.
  • FIG. 7 a is a screenshot illustrating a user's “ReadCast” settings for a set of “passive sharing” preference controls displayed on a web page in one embodiment of the present invention.
  • FIG. 7 b is a screenshot illustrating alternate “ReadCast” settings (to those illustrated in FIG. 7 a ) for a set of “passive sharing” preference controls displayed on a web page in one embodiment of the present invention.
  • FIG. 8 is a screenshot illustrating a Twitter dialog box invoked when a user selects the “ReadCast” setting (in one embodiment of the present invention) to “passively share” selected activities via the user's Twitter account.
  • FIG. 9 is a screenshot of a “ReadCast” dialog box displayed on a web page in one embodiment of the present invention, illustrating the conclusion of the process of setting a user's “passive sharing” preferences.
  • FIG. 10 is a flowchart illustrating a passive sharing process in accordance with one embodiment of the present invention, including the setting of a user's ReadCasting preferences and the automatic sharing (in accordance with those preferences) of the user's actions on a host website with the user's external social networks.
  • the Internet 110 is the platform on which a set of documents (e.g., PDF documents, not shown) is shared between a host server 120 , one or more client computers 130 and various members of social networks 140 , some of whom are users of client computers 130 .
  • Host server 130 converts the original documents into HTML (in accordance with the HTML 5.0 and CSS 3 specifications), employing the @font-face tag to download the original web fonts embedded in the documents, and integrates the document into the desired layout of a web page.
  • each document within the web page is preserved (as in the original document), including fonts and other page layout attributes.
  • the text remains searchable and the document can be viewed and controlled via standard web browser controls (without the need for any document-specific controls for printing, scrolling, zooming, etc).
  • the remainder of the web page may contain other web elements, including text, images, advertisements, animation, and video, as well as hyperlinks, buttons and various other static and interactive objects and functionality.
  • a user of one of client computers 130 accesses (via Internet 110 ) one of these documents integrated within a web page of a website hosted on host server 120 , the user can perform various reading-related actions on that host website with respect to that document, such as reading, annotating, rating or downloading the document (as well as uploading other documents).
  • the user can also set “ReadCasting” preferences which will automatically share such documents and metadata relating to such activities with desired members of the user's external social networks 140 (including the host website's own social network, if any).
  • FIG. 2 a illustrates a web page 200 in which one of such documents 210 is integrated, in accordance with one embodiment of the present invention.
  • custom fonts 220 from the original document have been preserved, and the document is integrated into the web page, with additional static and interactive elements 230 included above and alongside the document (or, in other embodiments, within the document itself).
  • FIG. 2 b illustrates a web page 250 containing a similar document 260 that not only preserves the fonts 270 from the original document, but also the page layout 280 of the original document across multiple pages. Thus, the appearance of the original document has been preserved, and it can be scrolled along with any remaining elements (not shown) on the web page via standard web browser scroll bars 290 .
  • FIG. 3 illustrates a web page 300 containing a similar document 310 with preservation of the appearance of the original document, including custom web fonts and various page layout attributes, and further illustrates that the text remains searchable (as opposed to mere images of the text fonts), as is evidenced by the highlighted portions 320 of the text.
  • searchable as opposed to mere images of the text fonts
  • FIG. 3 illustrates a web page 300 containing a similar document 310 with preservation of the appearance of the original document, including custom web fonts and various page layout attributes, and further illustrates that the text remains searchable (as opposed to mere images of the text fonts), as is evidenced by the highlighted portions 320 of the text.
  • users search this text, which is particularly useful for longer documents, but other programs can search for text, which can then be used for various purposes, such as providing targeted advertisements relating to particular portions of text (e.g., at the level of a document, an individual page or even specific words).
  • advertisements can be integrated not only on portions of the web page alongside the document (e.g., outside of the area in which the document is displayed), but also within the document itself. Because a long document is not confined to a separate fixed scrollable window within a web page, but rather extends the web page itself to the full length of the document, the entire length of the document is available for associated advertisements.
  • FIG. 4 a illustrates advertisements 420 inserted in between pages of a document 410 .
  • advertisements could also be located alongside the document outside of the document's frame. in either case, the advertisements would remain next to the relevant portions of the document as the entire web page is scrolled up and down.
  • FIG. 4 b illustrates advertisements 470 inserted into the “open space” within a page of the document 460 .
  • FIG. 5 One embodiment of the process of converting and integrating a document (e.g, a PDF document) into an existing HTML 5 web page is illustrated in FIG. 5 .
  • this process 500 unlike the traditional PDF-to-HTML conversion process, not only preserves the original fonts embedded within the document (in one embodiment, using the @font-face tag), but does so in a manner that enables the document to be integrated into an existing web page, as well as embedded into other web pages (e.g., by using the standard HTML “iframe” tag).
  • PDF in this embodiment
  • the text is preserved as searchable text
  • the document is integrated into a web page that can be searched, zoomed, scrolled, printed, etc., utilizing standard web browser controls (thereby providing a significantly increased “ad inventory”).
  • performance is enhanced for long documents by loading dynamically only a few pages before and after the current page being displayed. This decreases substantially the time required to load a document initially, and to scroll from page to page.
  • One tradeoff, however, is that current web browsers may not print a document correctly if all pages are not loaded. In that case, however, users may save a PDF version of the document which can then be printed.
  • Conversion process 500 begins with the input of a document (a PDF document in this embodiment) in step 510 which is to be converted and integrated into an existing HTML 5 web page and rendered on a client user's web browser.
  • the document is parsed in two passes, the first of which (step 520 ) identifies various document statistics and layering information for use in the second pass (step 530 ).
  • first pass 520 the document is parsed sequentially for distinct document “assets” (e.g., text, fonts and images in this embodiment) until each such asset has been processed.
  • assertions e.g., text, fonts and images in this embodiment
  • the identified asset is processed in step 527 (the manner depending upon the type of asset).
  • “font” assets various statistics are collected, such as the specific characters of that font actually used in the document (to save space and network bandwidth by ignoring unused characters), as well as the size, color, orientation and number of occurrences of such characters. Of course, various different collections of statistics could be extracted in other embodiments.
  • the conversion process 500 uses the @font-face tag to generate a “custom” font that can be used by a web browser as if it were one of the browser's “built-in” fonts. This aspect of process 500 occurs during second pass 530 (explained in greater detail below), utilizing the statistics collected during this first pass 520 .
  • step 527 identifies and stores the page of the document on which such assets occur, as well as the location of such assets on that page. This information also will be utilized during second pass 530 .
  • step 529 multi-layer objects are detected, and layering and clipping information is identified and stored for use during second pass 530 .
  • Many document formats including the PDF format, support rich document structures that include multiple layers of objects, such as blocks of text layered on top of vector graphics, which may be layered on top of other text objects that are layered on top of bitmaps, etc.
  • support for vector fills, gradient patterns, semitransparent bitmaps, clip polygons (that mask portions of layers below) and other structural document formatting features results in a complex multi-layer object hierarchy that (to conform to HTML5 standards) must be converted into a background image with some text on top.
  • This aspect of process 500 occurs during second pass 530 (explained in greater detail below), utilizing the layering and clipping information collected during this first pass 520 .
  • conversion process 500 proceeds from step 525 to second pass 530 .
  • each asset (text, font and image assets in this embodiment) is parsed sequentially until no such assets remain, as determined in step 535 , at which point the web page elements will be stored on the host server at step 580 for subsequent delivery to and rendering on the client's web browser, as discussed in greater detail below.
  • each asset is identified in step 545 as a text, font or image asset.
  • word and character spacing information is extracted in step 550 (utilizing the asset statistics generated during first pass 520 ) to determine the positions of each character and word of the text asset. Words are identified, for example, by detecting additional horizontal “space” between characters.
  • One embodiment of a paragraphization algorithm is employed, in step 552 , to extract “high-level” information regarding text assets, such as lines and paragraphs.
  • the location/position information extracted in first pass 520 including character and word spacing information (from step 552 ) is utilized to determine where lines and paragraphs begin and end.
  • Various algorithms can be employed to resolve this basic problem—i.e., identifying lines and paragraphs given “absolute location” information (e.g., spatial coordinates of characters and words employed by document formats such as PDF), and generating “relative location” information via line break, paragraph and other tags employed by the HTML 5 format.
  • paragraph delimiters are identified to distinguish distinct paragraphs from one another.
  • a typical paragraph “pattern” might consist of an indented first line. By detecting “lines” having similar “x coordinates,” a consistently higher “x coordinate” indicates an indented line. Similarly, an occasional doubled “y coordinate” differential indicates another common paragraph “pattern” with a blank line delimiting paragraphs.
  • paragraph “justifications” are also identified in step 552 .
  • consistent “x coordinates” at the beginning (but not the end) of each line of a paragraph indicates a “left-justified” paragraph.
  • a “right-justified” paragraph exhibits consistent “x coordinates” at the end (but not the beginning) of each line of the paragraph.
  • a consistent “x coordinate” differential between the beginning and end of each line of the paragraph indicates a “center-justified” paragraph.
  • step 554 The line spacing within (as well as between) paragraphs is discerned from “y coordinate” information, which is converted into appropriate HTML tags in step 554 to generate the appropriate line spacing.
  • Lines and paragraphs detected in step 552 are also converted into HTML 5 (and CSS 3) in step 554 using respective line break (“ ⁇ br>”) and paragraph (“ ⁇ p>”) tags, among other text and layout-related attributes (such as the text-indent CSS property).
  • additional line and paragraph attributes can be detected, and additional HTML tags can be employed.
  • control is returned to step 535 to determine whether any assets remain to be processed. If not, the converted document elements (along with existing non-document elements on the web page) are stored on the host server in step 580 , awaiting access during runtime.
  • the glyphs i.e., “images” of the characters of the font
  • the glyphs are extracted in step 560 .
  • only those glyphs that actually appear in the document are extracted (to save resources, such as memory and network bandwidth).
  • step 562 various geometric transforms are computed, if necessary, for specially formatted text.
  • each of the characters used in the document is converted, in one embodiment, to a “rotated glyph” (using a simple geometric transform) and stored in a font file as a character of the custom font, mapped to its corresponding unicode representation.
  • the vertical positions of each character are also stored in the font file (mapped to the rotated glyphs and their unicode representations), reflecting the increasing or decreasing slope of successive characters.
  • information relating to the slope of the diagonal can be maintained independently of the individual characters themselves.
  • Diagonal text can be detected directly from within a PDF document by virtue of PDF support for rotated text.
  • the presence of diagonal text may also be inferred from the absolute position data (e.g., periodically increasing or decreasing vertical coordinates of adjacent text characters) discerned from the document.
  • analogous adjustments are employed (in one embodiment, on a character-by-character basis).
  • related attributes can be encoded natively in the HTML 5 web page, such as character spacing, line-height, paragraphs, justification, etc.
  • the characters can, in one embodiment, optionally be encrypted, in step 563 (as a form of HTML 5-compliant “digital rights management” or DRM), to prevent users from copying and pasting the “protected” text into other environments.
  • DRM digital rights management
  • this solution leverages the @font-face mechanism built into HTML 5 to map individual characters to alternative characters (e.g., a “tilde”) that can be displayed in their place when a user attempts a copy and paste operation. In other words, rather than attempting to inhibit the copy and paste operation, it is allowed to proceed, but with substituted “encrypted” versions of the actual characters.
  • Each glyph will still appear in the user's web browser as intended. But, it will also be mapped (on the host server, in one embodiment) to an alternative “gibberish” character (e.g., a tilde), that in turn will be mapped to the actual unicode character itself (e.g., the letter “a”).
  • an alternative “gibberish” character e.g., a tilde
  • the actual unicode character will remain available, for example, if the user desires to conduct a text search. But, if the user attempts to copy and paste a block of text, the alternative characters will be substituted and, upon being pasted, will show up as “gibberish” characters (thus preventing the unauthorized transfer of such text to other environments).
  • mapping of the characters is confined to the host server, which can be invoked to generate the alternative characters when the user attempts to copy and paste the “encrypted” text (e.g., using a simple Javascript call in the source web page).
  • the mapping information can be contained within the files delivered to the user's web browser (avoiding the need to invoke the host for this purpose), though potentially compromising security in the event a third party is able to discern or disable this mapping process.
  • PDF documents (among others) store fonts in a myriad of different formats (e.g., Type1, Type3, OpenType, etc.), which, to be usable as a web font, must be converted (e.g., into “eot,” “ttf” and “svg” formats, accomodating different positions, encodings, transforms, etc.).
  • formats e.g., Type1, Type3, OpenType, etc.
  • “.eot” formats are utilized for Internet Explorer, “.svg” formats for embedded devices and “.ttf” formats for Firefox, Safari, Chrome, etc.
  • the @font-face CSS declaration for the “Zapfino” typeface might look like the following:
  • the glyphs and the corresponding unicode characters to which they are mapped are then converted, in step 564 , into the various web-readable font file formats (“eot,” “ttf” and “svg”), after which control is returned to step 535 to determine whether any assets remain to be processed. If not, the converted document elements (along with existing non-document elements on the web page) are stored on the host server in step 580 , awaiting access during runtime.
  • the conversion of fonts into the various web-readable formats in step 564 can be performed at the end of second pass 530 after all text, font and image assets have been parsed (as opposed to converting each font asset as it is parsed).
  • an “image” asset is identified, and the image is a “vector graphic” image, then it is rasterized (i.e., converted into a “bitmap” image) in step 570 .
  • vector graphics can be supported directly supported in HTML.
  • graphic layers are merged.
  • the “z order” of multi-layer objects e.g., bitmaps on text on vector graphics, along with vector fills, gradient patterns, clip polygons, etc.
  • a simpler HTML-friendly structure e.g., text on background image.
  • a boolean bitmap is maintained to facilitate the determination of whether particular page assets (bitmaps, text, vector graphics, etc.) share display space (in which case, for example, clipping is necessary to generate a merged bitmapped image).
  • the boolean bitmap identifies the regions of a page that have currently been “drawn” (processed), and thus which pixels need to be checked for overlap against the current asset being processed.
  • two boolean bitmaps are maintained—one for tracking the area currently occupied by the next bitmap (or rasterized vector graphic) being added to the display stack, and the other for tracking the area occupied by text objects. Until there exists overlap between these two boolean bitmaps, the order in which they are drawn makes no difference.
  • step 572 the two boolean bitmaps are refined in step 572 as each asset is processed, until a “final” background image is generated (taking into account any previously overlapping text) on top of which the “final” text layer is placed.
  • the image is split into separate files in step 574 .
  • step 576 the image may need to be scaled, converted or otherwise reformatted, depending upon its original format and the size and position information previously extracted.
  • step 574 and 576 can (like step 564 ) be performed at the end of second pass 530 after all text, font and image assets have been parsed (as opposed to splitting files and reformatting each image asset as it is parsed).
  • control is returned to step 535 to determine whether any assets remain to be processed. Once all text, font and image assets have been processed, the converted document elements (along with existing non-document elements on the web page) are stored on the host server in step 580 , awaiting access during runtime.
  • the document and non-document elements are loaded on the host server in step 585 and delivered to the client web browser, where they are integrated and rendered, in step 590 , on the client computer.
  • users of a website engage in various reading-related activities with respect to documents hosted on the website, regardless of whether such documents have been converted so as to retain the appearance of the original document (as discussed in Section A above).
  • These reading-related activities include reading, annotating, rating or downloading (as well as uploading) documents.
  • various other activities could be included and shared, such as the particular page or portion within a document that a user is reading, the number of pages read or even the time spent reading a particular document).
  • activities beyond those that are reading-related could be shared with external social networks in a similar fashion to that described herein.
  • FIG. 6 illustrates an embodiment of an initial “ReadCast” dialog box 610 next to a document 620 displayed on a web page 600 .
  • this dialog box 610 presents the user with an opportunity to set certain “passive sharing” preferences (not shown) that will result in the automatic sharing of the user's future reading-related activities with desired members of the user's external social networks. For example, after setting these preferences, the user might select a particular document, causing the system to automatically notify the user's Facebook friends (in accordance with the user's specified preferences) that the user has elected to read that particular document.
  • a list of all users who have read the document is displayed next to the document.
  • FIG. 7 a includes various preference controls 700 covering activities such as “Reading” 702 a document, “Downloading” 704 the document (or sending it to the user's mobile phone via the “Send to Mobile” activity 706 ), “Rating” 708 the document and “Scribbling” 710 (i.e., annotating the document).
  • activities such as “Reading” 702 a document, “Downloading” 704 the document (or sending it to the user's mobile phone via the “Send to Mobile” activity 706 ), “Rating” 708 the document and “Scribbling” 710 (i.e., annotating the document).
  • the user specifies, with respect to various social networks 715 (e.g., Facebook 716 , Twitter 717 and the Scribd website's own “internal” social network 718 ), whether each of the activities is shared (by specifying, for each activity, “always” share, “never” share, or “ask” the user at the time of engaging in the activity whether to share such action with the specified social networks).
  • various social networks 715 e.g., Facebook 716 , Twitter 717 and the Scribd website's own “internal” social network 718 .
  • the user has enabled all activities 700 and selected the “ask” radio button for each of them (with the exception of the Scribd social network, for which Rating and Scribbling can only be set to always be shared).
  • the system will automatically ask the user whether to share such information with the user's specified social network (e.g., Facebook friends or Twitter followers).
  • FIG. 7 b illustrates alternative ReadCast settings.
  • the “Send to Mobile” 706 and “Scribbling” 710 activities have been disabled by the user, and the “Reading” 702 activity is set to “always” be shared on Scribd 718 and “never” be shared on Facebook 716 , while the Rating activity is set to “always” be shared on Facebook 716 and Twitter 717 .
  • FIGS. 7 a and 7 b also include a “Link to Account” button 720 to enable the user to designate and access their particular Facebook or Twitter account.
  • FIG. 8 illustrates a Twitter dialog box 810 that is invoked when the user selects the “Link to Account” button under the Twitter column. This dialog box 810 provides the user with the opportunity (i.e., an additional layer of security provided by the social networking site) to allow or deny the host website access to the user's Twitter account (e.g., to share the user's designated activities on the Scribd website with the user's Twitter account).
  • the opportunity i.e., an additional layer of security provided by the social networking site
  • This dialog box 910 summarizes the user's selected preferences (e.g., indicating the social network(s) on which the user's activities are shared). In other embodiments, the specific activities that are enabled can be displayed.
  • ReadCast “passive sharing” preference settings have been saved, whenever the user performs one of the designated activities on the host website, a notification indicating that the user has performed that activity will be shared on the user's designated social networks (e.g., Facebook or Twitter, as well as the host Scribd network) without requiring any further action by the user.
  • designated social networks e.g., Facebook or Twitter, as well as the host Scribd network
  • a list of a user's “friends” or other contacts on external social networks is identified and maintained, and ReadCast notifications to anyone on that list are forwarded to the user's Scribd friends, thereby further extending such notifications to a social “network of networks” or a “social Internet.”
  • This is accomplished by using the APIs provided by external social networks (e.g., “Facebook Connect”) to copy and retain a portion of the user's “social graph” or a list of friends.
  • external social networks e.g., “Facebook Connect”
  • FIG. 10 A more detailed description of one embodiment of the passive sharing process is illustrated by the flowchart in FIG. 10 .
  • a user initially encounters on the host website (e.g., via dialog box 610 shown in FIG. 6 ) an opportunity to set initial ReadCast settings, represented by step 1010 in FIG. 10 .
  • the system 1000 displays, in step 1012 , the user's default ReadCast settings.
  • the user sets desired preferences in step 1014 , by associating particular activities with specified social networks, as explained above with respect to FIGS. 7 a and 7 b .
  • system 1000 enables, in step 1020 , the ReadCast passive sharing behaviors.
  • system 1000 detects, in step 1050 , a user's performance of one of the predefined actions, and checks, in step 1055 , to determine whether that user's ReadCast settings are enabled. If that user's ReadCast settings are not enabled, system 1000 simply permits the user to continue performing the desired reading-related activity (step 1090 ).
  • system 1000 identifies, in step 1060 , the particular activity being performed by the user and accesses, in step 1062 , the user's ReadCast preferences to determine, in step 1065 , whether the user's ReadCast settings are enabled for that particular activity. If not, system 1000 (as above) permits the user to continue performing the desired reading-related activity (step 1090 ).
  • system 1000 identifies, in step 1067 , the conditions under which the activity will be “passively shared” with the user's specified social networks. For example, as noted above with respect to FIGS. 7( a ) and 7 ( b ), the user may have enabled that activity to always be shared with certain social networks and never be shared with others (and perhaps to be asked at the time whether to share the activity with certain other social networks). Of course, in other embodiments, additional options and conditions could be specified.
  • system 1000 proceeds, in step 1069 , to initiate the “passive sharing” of that activity—e.g., to notify one or more of the user's designated social networks that the user has engaged in that particular activity.
  • System 1000 (as above) then permits the user to continue performing the desired reading-related activity (step 1090 ).

Abstract

In various embodiments of the present invention, documents (eg, PDFs) are converted into HTML 5 (and CSS 3) formats and integrated into existing HTML 5 web pages to preserve the original embedded fonts. The fonts can also be integrated or embedded (e.g., via the standard HTML “iframe” tag) into other web pages. The original appearance of the source document is maintained, the text is preserved as searchable text, and the document is integrated into a web page that can be searched, zoomed, scrolled, and printed utilizing standard web browser controls. A significantly increased “ad inventory” is thereby enabled, wherein advertisements can be integrated between pages, or even within a page. Moreover, the resulting document can be passively shared with members of a user's external social networks (including those within the host website), along with other activities and behaviors performed by the user on the hosting website.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a Divisional of U.S. patent application Ser. No. 13/189,372 filed Jul. 22, 2011 which is a Divisional of U.S. patent application Ser. No. 12/912,625 filed Oct. 26, 2010 both titled: “Integrated Document Viewer With Automatic Sharing of Reading-Related Activities Across External Social Networks,” which claims the benefit (pursuant to 35 U.S.C. §119(e)) of (i) U.S. Provisional Patent Application No. 61/326,166, filed Apr. 20, 2010, entitled “Integrated Document Viewer with Automatic Sharing of Reading-Related Activities Across External Social Networks,” and (ii) U.S. Provisional Patent Application No. 61/330,161, filed Apr. 30, 2010, entitled “Integrated Document Viewer with Automatic Sharing of Reading-Related Activities Across External Social Networks with Additions.” The entire disclosure of all of them are expressly incorporated herein by reference in their entireties.
  • I. BACKGROUND
  • A. Field of Art
  • This application relates generally to the integration of documents into web pages, and in particular to systems and techniques for preserving a document's original nature and appearance when displaying the document within the pages of a website, and automatically sharing users' reading-related activities on that website across their external social networks.
  • B. Description of Related Art
  • Well before the advent of the Internet and the World Wide Web, software developers struggled to display documents on a computer monitor in the form intended by the authors of such documents. Initially, documents displayed on a computer screen were limited to text, with little or no choice of fonts, much less page layout and formatting of any kind. As word processors and other presentation programs evolved, fonts were integrated and other media were added (such as images, animation and even video), along with page layout features for presenting the various components of a document with a particular appearance desired by the document's author. Moreover, documents themselves have evolved well beyond traditional text, to include various different static and interactive media and page layout attributes, and to appear in many different forms, ranging from short emails or blog posts to book previews, news articles and creative writing samples, to long novels or reference books, and almost anything in between.
  • As the Web gained traction in the early to mid 1990s, an entirely new medium for presenting and distributing documents evolved, and a new type of document was created—namely, the “web page” within a “website” containing a collection of related (and often linked) web pages. This new type of document, employing a document format known as “Hypertext Markup Language” (HTML), also went through a similar evolution to that of traditional documents, initially being limited to text, and soon adding other media, including images, animation, and video, as well as hyperlinks, buttons and various other interactive objects and functionality.
  • Whether an author initially creates a document as a web page (typically displayed via a program known as a “web browser”) or as a more traditional page-oriented document (i.e., a document that is inherently divided into pages corresponding to static “printable” pages), the author intends for the document to be printed or displayed on a computer monitor with a particular desired appearance. A document's appearance includes a variety of presentation and page layout characteristics, such as the position, size and orientation of various component text, graphic and other static and interactive objects on each page of the document. It should be noted that the nature or functionality of these object types also is generally intended to be preserved, particularly when displayed on a computer monitor.
  • Of particular importance, however, are the various fonts associated with specific text, which themselves have various attributes, including font type, size, style, etc. Given that most documents consist primarily of text, it is not surprising that the particular fonts employed within a document play a significant role in the document's overall appearance.
  • Maintaining a document's appearance as it is distributed among different computers and platforms (including its appearance when printed or displayed within a web page) has long been a problem addressed by various software technologies. For example, if a document is created with a particular word processing program and transferred to another computer which does not have access to that program, then the document may not even be accessible on the destination computer, or may only be accessible via another program that displays the document with a modified appearance (e.g., with different fonts or other formatting attributes).
  • One of the leading solutions to this problem, even pre-dating the Web, is the “portable document format” (PDF) created by Adobe Systems, Inc. The PDF is designed to preserve fonts, as well as page layout and other object and document formatting characteristics, so that documents retain a virtually identical appearance when distributed across computers and platforms, displayed on a computer monitor or printed onto a physical medium, such as paper. For this reason, the PDF has become a widely adopted standard document format for printing and distributing documents across computers and platforms, regardless of which program the document's author used to create the document.
  • At this point, it is virtually impossible to distinguish the appearance of a document created as a web page (HTML) from that of one created as a more traditional page-oriented document via a word processing, presentation or page layout program. Both can contain various media types, from static text and graphics to animation, video and other interactive objects and functionality, such as hyperlinks, buttons and other controls. Moreover, both can be printed as static pages on physical paper, even though HTML documents are not generally divided into distinct pages unless and until they are printed. Finally, both can be converted into PDF documents so as to retain their intended appearance when printed or distributed among different computers and platforms.
  • Even PDF documents, however, have been difficult to integrate into web pages, while preserving their intended appearance, due to historical formatting limitations of the HTML format, which traditionally has allowed for the display of only a limited number of fonts. For example, Adobe and others have created programs that display existing PDF documents within a web browser's window. Yet, these programs cause the document to occupy the entire web browser window (along with the controls typically associated with Adobe's “Acrobat” program for displaying PDF documents). In other words, although the PDF document may appear within a web browser's window, it is not truly integrated into another web page; instead it becomes a distinct “web page” of its own. Thus, the author of a web page cannot easily integrate an existing PDF document as part of a web page that includes other web elements or objects, such as text, images, advertisements, etc.
  • Other approaches to this problem include programs that use Adobe “Flash” (or other programming languages/platforms) to display a PDF document in a distinct window within a web page, preserving the appearance of the PDF document while still allowing for other components of the web page to be displayed within the same web browser window. This approach has a number of disadvantages, however, in that the PDF document is not truly integrated into the web page; instead it remains in a separately controlled window within that web page. For example, a user must scroll through the PDF document separately from the rest of the web page, resulting in the significant inconvenience of having to switch between scrolling through the PDF document and scrolling through the web page. Moreover, the “zoom” level and controls of the PDF document are distinct from those of the web page, often forcing the user to zoom the PDF document to a desired level for reading, but switch to a “global” zoom level to read the other components of the web page (text, images, ads, etc), and then reset the zoom level of the PDF document to continue reading (often while repeatedly readjusting the scrolling positions of the PDF document and the overall web page). In short, the PDF document becomes a separately controllable object that is subservient to the primary web browser controls for the overall web page window, resulting in significant inconvenience to the user.
  • Other approaches include PDF-to-HTML converters that enable the integration of the PDF document into a web page containing other component elements, but do so by sacrificing the original appearance of the document. For example, they convert the fonts embedded within the PDF document into the limited number of fonts typically made available to a computer's web browser. This approach defeats the primary objective of preserving the author's intended appearance of the PDF document.
  • Yet another approach involves converting the PDF document into an “image” which preserves its intended appearance while allowing for other components of the web page to be displayed within the same web browser window. To the extent this approach employs a separately scrollable window, it suffers from the same disadvantages as noted above. Even if the image of the entire document is truly integrated into a discrete area of the web page (as opposed to a separate scrollable “sub-window”), this approach, while preserving the appearance of text, does not preserve the nature of the text itself. In other words, the ability to search and recognize the text is sacrificed, which results in a significant loss of functionality. Not only are users unable to search through the PDF document, but other programs cannot search through and identify words and phrases within the PDF document, a critical feature for targeted advertising engines.
  • Google has adopted a variation of this approach with its “Google PDF viewer,” which is integrated into its “Gmail,” “Google Docs” and other programs. While each page of a PDF document is still converted into an “image” under this approach, users can search for individual words within the document by virtue of Google's “thin client” approach, which relies upon frequent interaction between the user's web browser and a remote web server.
  • For example, upon detecting that the user has attempted to select a word by clicking on the portion of the image containing that word, the user's web browser invokes the remote web server, which must parse the page of the PDF document to identify the “text” version of that word (e.g., the individual ASCII characters of the word), which can then be sent to the user's web browser, for example, to highlight the word or permit it to be copied and pasted elsewhere. Moreover, a user can search for words within the document by typing them into the user's web browser, which again must invoke the remote web server to conduct the search on the “text” within the PDF document, and then return the results to the user's web browser.
  • Yet, this “thin client” approach suffers from a number of disadvantages that result from converting the PDF document into an “image” rather than directly into text (along with the fonts that determine the appearance of that text). For example, the “image” of each page of the document is significantly larger than the corresponding text on that page (even apart from other non-text elements on the page), resulting in an additional delay before each page of the document can be delivered to and displayed by the user's web browser.
  • Moreover, the frequent server interaction imposes further delays whenever the user interacts with the document, e.g., by scrolling to a new page or selecting or searching for words within the document. Even though the “image” of each page can be “zoomed” with the user's standard web browser controls, the words of the document become distorted when zoomed (as would any bitmapped image of text), causing Google to include a custom “zoom” control to avoid this distortion, but at the expense of further delay due to additional server interaction.
  • In short, there remains a need for the true integration of PDF and other documents into a web page that preserves the original nature and appearance of the documents (including in particular the original text fonts and the ability to search the text), allows for other components of the web page to coexist within the same web browser window, and enables users to read, interact with and control all components of the web page (including the document) via the controls built into standard web browsers.
  • In addition to reading a PDF or other document as an integral part of a web page, users may also desire to share their reading-related activities (e.g., viewing, annotating, rating, uploading and downloading documents) with friends or other members of their social networks. Yet, actively choosing to share an activity or behavior is burdensome. For this reason, “passive sharing” is more desirable (i.e., setting predefined sharing preferences, with future behavior resulting in the automatic sharing of such behavior in accordance with those preferences).
  • While passive sharing is becoming increasingly more common, it has yet to be integrated into the activities or behavior within a website independent of the sharing process itself. For example, the sharing of activities and behavior on a social networking site, such as Facebook, Twitter and MySpace, is integral to the nature of these sites. Sharing messages, high scores of games played on the site and other activities is the very essence of participation in these social networks.
  • As these social networks have grown exponentially in popularity, even external behavior is now being “passively shared” among members of these social networks. For example, “Blippy” (a service offered via the website, www.blippy.com) enables users to share their “purchasing behavior” (i.e., purchases made anywhere via a credit card, registered at the “Blippy” website) with other members of their social networks. Yet, even Blippy is designed with sharing as an integral component. Users already purchase items with their credit cards, and they already share their activities and behavior on their social networks with other members. Blippy simply connects the two, enabling the passive sharing of this existing external behavior (shopping) with users' existing social networks (e.g., Facebook friends).
  • As the concept of “passive sharing” increases in popularity, there is a desire on the part of many users to enable their activities and behavior on a website (that are otherwise unrelated to their social networks) to be passively shared among their social networks (even beyond that website).
  • II. SUMMARY
  • Various embodiments of the current invention are disclosed herein, including techniques, apparatus, and systems for preserving a document's original nature and appearance when displaying the document within the pages of a website, and automatically sharing users' reading-related activities on that website across their external social networks.
  • While various iterations of the HTML format have included over time a feature allowing for the downloading of custom fonts (“web fonts”) that can be embedded into web pages, web fonts have been employed to enhance the authoring capabilities of HTML documents, rather than to facilitate the integration of PDF and other documents into web pages. For example, the “@font-face” tag has been a component of the “Cascading Style Sheets” (CSS) specification for a number of years. Most recently, the HTML 5.0 specification, which relies upon CSS 3 (which includes the @font-face tag), has been (or soon will be) implemented in most major web browsers (e.g., Firefox, Safari, Internet Explorer, etc).
  • In one embodiment of the present invention, the @font-face tag is employed in connection with the conversion of a PDF document into HTML to ensure the preservation of the original fonts embedded within that document. These fonts are downloaded and employed to generate the resulting HTML 5 document, which can then be integrated into any desired web page, as well as embedded into other web pages (e.g., by using the standard HTML “iframe” tag). In this manner, the original appearance of the source document (PDF, in this embodiment) is maintained, the text is preserved as searchable text, and the document is integrated into a web page that can be searched, zoomed, scrolled, printed, etc., utilizing standard web browser controls.
  • Moreover, because the PDF document is now an integral component of the resulting HTML 5 web page, a significantly increased “ad inventory” is enabled. Advertisements can be integrated between the individual pages (or even within a page) of the document. Even in the context of a relatively short 20-page document, there is at least a 20-fold increase in the ad inventory than would be present if the document were confined to a separately scrolled window within the web browser's window.
  • In addition, the resulting document (independent of its format) can be passively shared with desired members of a reader's external social networks (as well as any social network within the host website), along with other reading-related activities and behavior performed by the reader on the website hosting the document. In one embodiment, a user sets predefined sharing preferences identifying particular social networks (e.g., Twitter, Facebook, MySpace, and the host website's social network) as well as specific activities and behavior on the website to be shared on those social networks (e.g., in this embodiment, which documents have been viewed, downloaded or uploaded, or even how many pages have been viewed, as well as annotations, ratings and various other behavior or extracted analytics).
  • It should be noted that virtually any activities and behaviors within a website can be passively shared with a user's external social networks. In one embodiment discussed in greater detail below, a user's reading-related activities within a host website are automatically shared with desired members of a user's social networks in accordance with the user's predefined sharing preferences. The user simply accesses the host website with the desire to read documents and perform other reading-related activities, with the result that such activities are automatically “passively shared” without any further action by the user.
  • The value of such passive sharing from a host website to members of external social networks cannot be underestimated. In addition to the communication and other “community” benefits to users and other members of their social networks, the host websites derive significant potential value from the exponential targeted referral and advertising opportunities. These benefits are described in greater detail below.
  • III. BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of one embodiment of the platform and key system components employed by the present invention, including user devices, host websites and key architectural components.
  • FIG. 2 a is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the preservation of fonts from the original document, as well as the integration of the document with other elements on the web page.
  • FIG. 2 b is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the preservation not only of fonts from the original document, but also the page layout of the original document across multiple pages.
  • FIG. 3 is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the preservation not only of fonts from the original document, but also searchable text displayed with its original fonts.
  • FIG. 4 a is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the insertion of advertisements between pages of the original document.
  • FIG. 4 b is a screenshot of a document converted into an HTML 5 document and integrated into a web page in one embodiment of the present invention, illustrating the insertion of advertisements in the “open space” within a page of the original document.
  • FIG. 5 is a flowchart illustrating a process of converting and integrating a document (e.g, a PDF document) into an existing HTML 5 web page in accordance with one embodiment of the present invention.
  • FIG. 6 is a screenshot of an initial “ReadCast” dialog box appearing next to a document displayed on a web page in one embodiment of the present invention, illustrating the initiation of the process of setting a user's “passive sharing” preferences.
  • FIG. 7 a is a screenshot illustrating a user's “ReadCast” settings for a set of “passive sharing” preference controls displayed on a web page in one embodiment of the present invention.
  • FIG. 7 b is a screenshot illustrating alternate “ReadCast” settings (to those illustrated in FIG. 7 a) for a set of “passive sharing” preference controls displayed on a web page in one embodiment of the present invention.
  • FIG. 8 is a screenshot illustrating a Twitter dialog box invoked when a user selects the “ReadCast” setting (in one embodiment of the present invention) to “passively share” selected activities via the user's Twitter account.
  • FIG. 9 is a screenshot of a “ReadCast” dialog box displayed on a web page in one embodiment of the present invention, illustrating the conclusion of the process of setting a user's “passive sharing” preferences.
  • FIG. 10 is a flowchart illustrating a passive sharing process in accordance with one embodiment of the present invention, including the setting of a user's ReadCasting preferences and the automatic sharing (in accordance with those preferences) of the user's actions on a host website with the user's external social networks.
  • IV. DETAILED DESCRIPTION OF THE CURRENT INVENTION A. Integrated Document Viewer
  • In one embodiment 100 of the present invention, illustrated in FIG. 1, the Internet 110 is the platform on which a set of documents (e.g., PDF documents, not shown) is shared between a host server 120, one or more client computers 130 and various members of social networks 140, some of whom are users of client computers 130. In this embodiment, Host server 130 converts the original documents into HTML (in accordance with the HTML 5.0 and CSS 3 specifications), employing the @font-face tag to download the original web fonts embedded in the documents, and integrates the document into the desired layout of a web page.
  • In this manner, the appearance of each document within the web page is preserved (as in the original document), including fonts and other page layout attributes. As will be illustrated below, the text remains searchable and the document can be viewed and controlled via standard web browser controls (without the need for any document-specific controls for printing, scrolling, zooming, etc). The remainder of the web page (including areas within the document itself) may contain other web elements, including text, images, advertisements, animation, and video, as well as hyperlinks, buttons and various other static and interactive objects and functionality.
  • When a user of one of client computers 130 accesses (via Internet 110) one of these documents integrated within a web page of a website hosted on host server 120, the user can perform various reading-related actions on that host website with respect to that document, such as reading, annotating, rating or downloading the document (as well as uploading other documents). As will be illustrated below, the user can also set “ReadCasting” preferences which will automatically share such documents and metadata relating to such activities with desired members of the user's external social networks 140 (including the host website's own social network, if any).
  • FIG. 2 a illustrates a web page 200 in which one of such documents 210 is integrated, in accordance with one embodiment of the present invention. As is apparent from this screenshot, custom fonts 220 from the original document have been preserved, and the document is integrated into the web page, with additional static and interactive elements 230 included above and alongside the document (or, in other embodiments, within the document itself).
  • FIG. 2 b illustrates a web page 250 containing a similar document 260 that not only preserves the fonts 270 from the original document, but also the page layout 280 of the original document across multiple pages. Thus, the appearance of the original document has been preserved, and it can be scrolled along with any remaining elements (not shown) on the web page via standard web browser scroll bars 290.
  • FIG. 3 illustrates a web page 300 containing a similar document 310 with preservation of the appearance of the original document, including custom web fonts and various page layout attributes, and further illustrates that the text remains searchable (as opposed to mere images of the text fonts), as is evidenced by the highlighted portions 320 of the text. As noted above, not only can users search this text, which is particularly useful for longer documents, but other programs can search for text, which can then be used for various purposes, such as providing targeted advertisements relating to particular portions of text (e.g., at the level of a document, an individual page or even specific words).
  • Regardless of their source, advertisements can be integrated not only on portions of the web page alongside the document (e.g., outside of the area in which the document is displayed), but also within the document itself. Because a long document is not confined to a separate fixed scrollable window within a web page, but rather extends the web page itself to the full length of the document, the entire length of the document is available for associated advertisements.
  • FIG. 4 a illustrates advertisements 420 inserted in between pages of a document 410. In another embodiment, such advertisements could also be located alongside the document outside of the document's frame. in either case, the advertisements would remain next to the relevant portions of the document as the entire web page is scrolled up and down. Similarly, FIG. 4 b illustrates advertisements 470 inserted into the “open space” within a page of the document 460.
  • One embodiment of the process of converting and integrating a document (e.g, a PDF document) into an existing HTML 5 web page is illustrated in FIG. 5. As noted above, this process 500, unlike the traditional PDF-to-HTML conversion process, not only preserves the original fonts embedded within the document (in one embodiment, using the @font-face tag), but does so in a manner that enables the document to be integrated into an existing web page, as well as embedded into other web pages (e.g., by using the standard HTML “iframe” tag). Thus, the original appearance of the source document (PDF, in this embodiment) is maintained, the text is preserved as searchable text, and the document is integrated into a web page that can be searched, zoomed, scrolled, printed, etc., utilizing standard web browser controls (thereby providing a significantly increased “ad inventory”).
  • In one embodiment, performance is enhanced for long documents by loading dynamically only a few pages before and after the current page being displayed. This decreases substantially the time required to load a document initially, and to scroll from page to page. One tradeoff, however, is that current web browsers may not print a document correctly if all pages are not loaded. In that case, however, users may save a PDF version of the document which can then be printed.
  • Conversion process 500 begins with the input of a document (a PDF document in this embodiment) in step 510 which is to be converted and integrated into an existing HTML 5 web page and rendered on a client user's web browser. The document is parsed in two passes, the first of which (step 520) identifies various document statistics and layering information for use in the second pass (step 530). During first pass 520, the document is parsed sequentially for distinct document “assets” (e.g., text, fonts and images in this embodiment) until each such asset has been processed. Once no document assets remain to be processed, as determined in step 525, processing proceeds to second pass 530.
  • Otherwise, the identified asset is processed in step 527 (the manner depending upon the type of asset). For “font” assets, various statistics are collected, such as the specific characters of that font actually used in the document (to save space and network bandwidth by ignoring unused characters), as well as the size, color, orientation and number of occurrences of such characters. Of course, various different collections of statistics could be extracted in other embodiments.
  • Because PDF documents store fonts in a myriad of different formats (e.g., Type1, Type3, OpenType, etc.) that are not directly usable as web fonts, and because a font may be used in different places within the document with different encodings and/or transforms, the conversion process 500 uses the @font-face tag to generate a “custom” font that can be used by a web browser as if it were one of the browser's “built-in” fonts. This aspect of process 500 occurs during second pass 530 (explained in greater detail below), utilizing the statistics collected during this first pass 520.
  • For “text” and “image” assets, step 527 identifies and stores the page of the document on which such assets occur, as well as the location of such assets on that page. This information also will be utilized during second pass 530.
  • Finally, in step 529, multi-layer objects are detected, and layering and clipping information is identified and stored for use during second pass 530. Many document formats, including the PDF format, support rich document structures that include multiple layers of objects, such as blocks of text layered on top of vector graphics, which may be layered on top of other text objects that are layered on top of bitmaps, etc. In addition to this complex “z order” of objects, support for vector fills, gradient patterns, semitransparent bitmaps, clip polygons (that mask portions of layers below) and other structural document formatting features, results in a complex multi-layer object hierarchy that (to conform to HTML5 standards) must be converted into a background image with some text on top. This aspect of process 500 occurs during second pass 530 (explained in greater detail below), utilizing the layering and clipping information collected during this first pass 520.
  • Once all document assets have been parsed and processed in first pass 520, conversion process 500 proceeds from step 525 to second pass 530. Here too, each asset (text, font and image assets in this embodiment) is parsed sequentially until no such assets remain, as determined in step 535, at which point the web page elements will be stored on the host server at step 580 for subsequent delivery to and rendering on the client's web browser, as discussed in greater detail below.
  • Otherwise, each asset is identified in step 545 as a text, font or image asset. The parsing of each media asset during second pass 530 will now be discussed. For each “text” asset, word and character spacing information is extracted in step 550 (utilizing the asset statistics generated during first pass 520) to determine the positions of each character and word of the text asset. Words are identified, for example, by detecting additional horizontal “space” between characters.
  • One embodiment of a paragraphization algorithm is employed, in step 552, to extract “high-level” information regarding text assets, such as lines and paragraphs. The location/position information extracted in first pass 520, including character and word spacing information (from step 552) is utilized to determine where lines and paragraphs begin and end. Various algorithms can be employed to resolve this basic problem—i.e., identifying lines and paragraphs given “absolute location” information (e.g., spatial coordinates of characters and words employed by document formats such as PDF), and generating “relative location” information via line break, paragraph and other tags employed by the HTML 5 format.
  • In step 552, paragraph delimiters are identified to distinguish distinct paragraphs from one another. A typical paragraph “pattern” might consist of an indented first line. By detecting “lines” having similar “x coordinates,” a consistently higher “x coordinate” indicates an indented line. Similarly, an occasional doubled “y coordinate” differential indicates another common paragraph “pattern” with a blank line delimiting paragraphs.
  • In addition to detecting delimiters to identify distinct paragraphs, paragraph “justifications” (e.g., left, center and right justifications) are also identified in step 552. For example, consistent “x coordinates” at the beginning (but not the end) of each line of a paragraph indicates a “left-justified” paragraph. Conversely, a “right-justified” paragraph exhibits consistent “x coordinates” at the end (but not the beginning) of each line of the paragraph. Finally, a consistent “x coordinate” differential between the beginning and end of each line of the paragraph indicates a “center-justified” paragraph.
  • The line spacing within (as well as between) paragraphs is discerned from “y coordinate” information, which is converted into appropriate HTML tags in step 554 to generate the appropriate line spacing. Lines and paragraphs detected in step 552 are also converted into HTML 5 (and CSS 3) in step 554 using respective line break (“<br>”) and paragraph (“<p>”) tags, among other text and layout-related attributes (such as the text-indent CSS property). In other embodiments, additional line and paragraph attributes can be detected, and additional HTML tags can be employed.
  • Having extracted the high-level line and paragraph information with respect to the text asset in step 552, and converted this “absolute location” information in step 554 into the “relative location” attributes of the HTML 5 and CSS 3 formats, control is returned to step 535 to determine whether any assets remain to be processed. If not, the converted document elements (along with existing non-document elements on the web page) are stored on the host server in step 580, awaiting access during runtime.
  • Otherwise, if a “font” asset is identified, the glyphs (i.e., “images” of the characters of the font) are extracted in step 560. As noted above, in one embodiment, only those glyphs that actually appear in the document are extracted (to save resources, such as memory and network bandwidth).
  • These glyphs are mapped in a font file to the unicode representations of the characters they represent. To access the font file from an HTML 5 web page, an @font-face CSS declaration is employed in the page style block for the font. This creates a custom font definition that can be used by a web browser as if the font were one of the browser's built-in fonts.
  • In step 562, various geometric transforms are computed, if necessary, for specially formatted text. For example, if diagonal text is employed, each of the characters used in the document is converted, in one embodiment, to a “rotated glyph” (using a simple geometric transform) and stored in a font file as a character of the custom font, mapped to its corresponding unicode representation. In this embodiment, the vertical positions of each character are also stored in the font file (mapped to the rotated glyphs and their unicode representations), reflecting the increasing or decreasing slope of successive characters. In other embodiments, information relating to the slope of the diagonal (and even to the rotation of each individual character) can be maintained independently of the individual characters themselves.
  • Diagonal text can be detected directly from within a PDF document by virtue of PDF support for rotated text. The presence of diagonal text may also be inferred from the absolute position data (e.g., periodically increasing or decreasing vertical coordinates of adjacent text characters) discerned from the document.
  • For other transforms, analogous adjustments are employed (in one embodiment, on a character-by-character basis). Apart from the information stored in the font file, accessible via the @font-face tag, related attributes can be encoded natively in the HTML 5 web page, such as character spacing, line-height, paragraphs, justification, etc.
  • Before converting (in step 564) these transformed sets of characters into the appropriate web-readable formats, the characters can, in one embodiment, optionally be encrypted, in step 563 (as a form of HTML 5-compliant “digital rights management” or DRM), to prevent users from copying and pasting the “protected” text into other environments. Unlike the convoluted and easily circumvented methods currently employed to prevent the copying and pasting of text from within web pages (e.g., often relying upon custom Javascript), this solution leverages the @font-face mechanism built into HTML 5 to map individual characters to alternative characters (e.g., a “tilde”) that can be displayed in their place when a user attempts a copy and paste operation. In other words, rather than attempting to inhibit the copy and paste operation, it is allowed to proceed, but with substituted “encrypted” versions of the actual characters.
  • Each glyph will still appear in the user's web browser as intended. But, it will also be mapped (on the host server, in one embodiment) to an alternative “gibberish” character (e.g., a tilde), that in turn will be mapped to the actual unicode character itself (e.g., the letter “a”). Thus, the actual unicode character will remain available, for example, if the user desires to conduct a text search. But, if the user attempts to copy and paste a block of text, the alternative characters will be substituted and, upon being pasted, will show up as “gibberish” characters (thus preventing the unauthorized transfer of such text to other environments).
  • It should be noted that, for maximum security, the mapping of the characters is confined to the host server, which can be invoked to generate the alternative characters when the user attempts to copy and paste the “encrypted” text (e.g., using a simple Javascript call in the source web page). In other embodiments, the mapping information can be contained within the files delivered to the user's web browser (avoiding the need to invoke the host for this purpose), though potentially compromising security in the event a third party is able to discern or disable this mapping process.
  • As noted above, PDF documents (among others) store fonts in a myriad of different formats (e.g., Type1, Type3, OpenType, etc.), which, to be usable as a web font, must be converted (e.g., into “eot,” “ttf” and “svg” formats, accomodating different positions, encodings, transforms, etc.). To accommodate differences among individual web browsers (including those on embedded devices, such as mobile phones), multiple font files are employed to ensure @font-face support among the differing formats.
  • For example, in one embodiment, “.eot” formats are utilized for Internet Explorer, “.svg” formats for embedded devices and “.ttf” formats for Firefox, Safari, Chrome, etc. Thus, the @font-face CSS declaration for the “Zapfino” typeface might look like the following:
  • @font-face {
    font-family: ‘Zapfino’;
    src: url (‘Zapfino.eot’);
    src: url (‘zapfino/zapfino.svg’) format (‘svg’);
    src: local (‘\u263a’), url (‘Zapfino.otf’) format (‘truetype’);
    }
  • Whether or not geometrically transformed and/or optionally encrypted, the glyphs and the corresponding unicode characters to which they are mapped, are then converted, in step 564, into the various web-readable font file formats (“eot,” “ttf” and “svg”), after which control is returned to step 535 to determine whether any assets remain to be processed. If not, the converted document elements (along with existing non-document elements on the web page) are stored on the host server in step 580, awaiting access during runtime.
  • It should be noted that, in other embodiments, the conversion of fonts into the various web-readable formats in step 564 can be performed at the end of second pass 530 after all text, font and image assets have been parsed (as opposed to converting each font asset as it is parsed).
  • Finally, if an “image” asset is identified, and the image is a “vector graphic” image, then it is rasterized (i.e., converted into a “bitmap” image) in step 570. In other embodiments, vector graphics can be supported directly supported in HTML. Then, in step 572, graphic layers are merged. As noted above, the “z order” of multi-layer objects (e.g., bitmaps on text on vector graphics, along with vector fills, gradient patterns, clip polygons, etc.) must be preserved while generating a simpler HTML-friendly structure (e.g., text on background image).
  • In one embodiment, a boolean bitmap is maintained to facilitate the determination of whether particular page assets (bitmaps, text, vector graphics, etc.) share display space (in which case, for example, clipping is necessary to generate a merged bitmapped image). The boolean bitmap identifies the regions of a page that have currently been “drawn” (processed), and thus which pixels need to be checked for overlap against the current asset being processed.
  • In one embodiment, two boolean bitmaps are maintained—one for tracking the area currently occupied by the next bitmap (or rasterized vector graphic) being added to the display stack, and the other for tracking the area occupied by text objects. Until there exists overlap between these two boolean bitmaps, the order in which they are drawn makes no difference.
  • In this manner, the two boolean bitmaps are refined in step 572 as each asset is processed, until a “final” background image is generated (taking into account any previously overlapping text) on top of which the “final” text layer is placed. It should be noted that, where white space exists between image assets, the image is split into separate files in step 574. And, in step 576, the image may need to be scaled, converted or otherwise reformatted, depending upon its original format and the size and position information previously extracted. In other embodiments, step 574 and 576 can (like step 564) be performed at the end of second pass 530 after all text, font and image assets have been parsed (as opposed to splitting files and reformatting each image asset as it is parsed).
  • Finally, control is returned to step 535 to determine whether any assets remain to be processed. Once all text, font and image assets have been processed, the converted document elements (along with existing non-document elements on the web page) are stored on the host server in step 580, awaiting access during runtime.
  • When accessed by a client web browser during runtime, the document and non-document elements (including the insertion of ads that may change dynamically) are loaded on the host server in step 585 and delivered to the client web browser, where they are integrated and rendered, in step 590, on the client computer.
  • B. Automatic Sharing of Reading-Related Activities Across External Social Networks
  • As alluded to above, in one embodiment of the present invention, users of a website engage in various reading-related activities with respect to documents hosted on the website, regardless of whether such documents have been converted so as to retain the appearance of the original document (as discussed in Section A above). These reading-related activities include reading, annotating, rating or downloading (as well as uploading) documents. Note that, in other embodiments, various other activities could be included and shared, such as the particular page or portion within a document that a user is reading, the number of pages read or even the time spent reading a particular document). In one embodiment, Moreover, activities beyond those that are reading-related, could be shared with external social networks in a similar fashion to that described herein.
  • FIG. 6 illustrates an embodiment of an initial “ReadCast” dialog box 610 next to a document 620 displayed on a web page 600. While a user's intent in accessing web page 600 is to read document 620 and engage in various other reading-relating activities, this dialog box 610 presents the user with an opportunity to set certain “passive sharing” preferences (not shown) that will result in the automatic sharing of the user's future reading-related activities with desired members of the user's external social networks. For example, after setting these preferences, the user might select a particular document, causing the system to automatically notify the user's Facebook friends (in accordance with the user's specified preferences) that the user has elected to read that particular document. In another embodiment (not shown), whenever a user reads a document, a list of all users who have read the document is displayed next to the document.
  • One embodiment of these “ReadCast” settings is illustrated in FIG. 7 a, which includes various preference controls 700 covering activities such as “Reading” 702 a document, “Downloading” 704 the document (or sending it to the user's mobile phone via the “Send to Mobile” activity 706), “Rating” 708 the document and “Scribbling” 710 (i.e., annotating the document). In addition, the user specifies, with respect to various social networks 715 (e.g., Facebook 716, Twitter 717 and the Scribd website's own “internal” social network 718), whether each of the activities is shared (by specifying, for each activity, “always” share, “never” share, or “ask” the user at the time of engaging in the activity whether to share such action with the specified social networks).
  • For example, in FIG. 7 a, the user has enabled all activities 700 and selected the “ask” radio button for each of them (with the exception of the Scribd social network, for which Rating and Scribbling can only be set to always be shared). Thus, when the user reads a particular document (or rates, annotates, downloads or sends the document to the user's mobile phone), the system will automatically ask the user whether to share such information with the user's specified social network (e.g., Facebook friends or Twitter followers).
  • FIG. 7 b illustrates alternative ReadCast settings. For example, the “Send to Mobile” 706 and “Scribbling” 710 activities have been disabled by the user, and the “Reading” 702 activity is set to “always” be shared on Scribd 718 and “never” be shared on Facebook 716, while the Rating activity is set to “always” be shared on Facebook 716 and Twitter 717.
  • FIGS. 7 a and 7 b also include a “Link to Account” button 720 to enable the user to designate and access their particular Facebook or Twitter account. FIG. 8 illustrates a Twitter dialog box 810 that is invoked when the user selects the “Link to Account” button under the Twitter column. This dialog box 810 provides the user with the opportunity (i.e., an additional layer of security provided by the social networking site) to allow or deny the host website access to the user's Twitter account (e.g., to share the user's designated activities on the Scribd website with the user's Twitter account).
  • After completing the designation of the desired ReadCast preferences, the user selects the “Save Changes” button 730 (shown in FIGS. 7 a and 7 b), which results (in one embodiment) in the dialog box 910 illustrated in FIG. 9. This dialog box 910 summarizes the user's selected preferences (e.g., indicating the social network(s) on which the user's activities are shared). In other embodiments, the specific activities that are enabled can be displayed.
  • Once these ReadCast “passive sharing” preference settings have been saved, whenever the user performs one of the designated activities on the host website, a notification indicating that the user has performed that activity will be shared on the user's designated social networks (e.g., Facebook or Twitter, as well as the host Scribd network) without requiring any further action by the user.
  • In another embodiment, a list of a user's “friends” or other contacts on external social networks is identified and maintained, and ReadCast notifications to anyone on that list are forwarded to the user's Scribd friends, thereby further extending such notifications to a social “network of networks” or a “social Internet.” This is accomplished by using the APIs provided by external social networks (e.g., “Facebook Connect”) to copy and retain a portion of the user's “social graph” or a list of friends. Once the user's social graph is copied to the social network within the host website, specific activities can be shared with that user's social network without further interaction with external social networks or services.
  • A more detailed description of one embodiment of the passive sharing process is illustrated by the flowchart in FIG. 10. As discussed above, a user initially encounters on the host website (e.g., via dialog box 610 shown in FIG. 6) an opportunity to set initial ReadCast settings, represented by step 1010 in FIG. 10. The system 1000 then displays, in step 1012, the user's default ReadCast settings. The user then sets desired preferences in step 1014, by associating particular activities with specified social networks, as explained above with respect to FIGS. 7 a and 7 b. Upon initially saving those preferences (which, in one embodiment, the user can revise at any time), system 1000 enables, in step 1020, the ReadCast passive sharing behaviors.
  • As users perform various reading-related activities on the host website, system 1000 detects, in step 1050, a user's performance of one of the predefined actions, and checks, in step 1055, to determine whether that user's ReadCast settings are enabled. If that user's ReadCast settings are not enabled, system 1000 simply permits the user to continue performing the desired reading-related activity (step 1090).
  • Otherwise, system 1000 identifies, in step 1060, the particular activity being performed by the user and accesses, in step 1062, the user's ReadCast preferences to determine, in step 1065, whether the user's ReadCast settings are enabled for that particular activity. If not, system 1000 (as above) permits the user to continue performing the desired reading-related activity (step 1090).
  • If the user's ReadCast settings are enabled for that particular activity, then system 1000 identifies, in step 1067, the conditions under which the activity will be “passively shared” with the user's specified social networks. For example, as noted above with respect to FIGS. 7( a) and 7(b), the user may have enabled that activity to always be shared with certain social networks and never be shared with others (and perhaps to be asked at the time whether to share the activity with certain other social networks). Of course, in other embodiments, additional options and conditions could be specified.
  • Finally, to the extent a particular activity (e.g., reading a particular article on the host website) has been designated to be shared with one or more of the user's social networks, then system 1000 proceeds, in step 1069, to initiate the “passive sharing” of that activity—e.g., to notify one or more of the user's designated social networks that the user has engaged in that particular activity. System 1000 (as above) then permits the user to continue performing the desired reading-related activity (step 1090).
  • It should be emphasized that various modifications and combinations of the above-described embodiments can be employed without departing from the spirit of the present invention.

Claims (19)

1. A method for converting and integrating non-HTML documents into HTML web pages on a host server while preserving the original appearance and text searchability of the documents, the method including the following steps:
(a) parsing a document to extract text characters and associated fonts, as well as page layout attributes of the document, each glyph in a font representing the appearance of its associated text character with respect to that font;
(b) integrating the text characters into an HTML web page and generating HTML tags to preserve the document's page layout attributes;
(c) generating one or more font files, accessible from the HTML web page, that map the text characters to their associated glyphs; and
(d) storing the HTML web page and font files on the host server for delivery to and rendering within the window of a client web browser, whereby the original appearance and text searchability of the document is preserved.
2. The method of claim 1 wherein the CSS 3 @font-face tag is employed to link the font files to the HTML web page.
3. The method of claim 1 wherein the HTML web page contains a plurality of web page elements external to the document, and wherein the document and the plurality of web page elements can be displayed within the client's web browser window.
4. The method of claim 1 wherein a user of the client's web browser can select and search for text within the document using the client web browser's standard controls.
5. The method of claim 1 wherein a user of the client's web browser can zoom text within the document and scroll among the pages of the document using the client web browser's standard controls.
6. The method of claim 3 wherein the plurality of web page elements include an advertisement external to the document.
7. The method of claim 6 wherein the advertisement is located to the side of a page of the document, whereby the ad inventory of the web page is proportional to the number of pages of the document.
8. The method of claim 1 wherein the font files include, for each text character, a mismatched character code that does not correspond to the character's associated glyph, and wherein the HTML web page contains the mismatched character codes and instructions directing the web browser to use the font files for displaying the glyphs, whereby the web browser utilizes the font files to display the text characters correctly, but cannot search for or copy the text characters due to the mismatched character codes in the HTML web page.
9. The method of claim 1 wherein the page layout attributes of at least some portion of the document are specified in the HTML web page by the organization of the text characters into words, lines and paragraphs, and wherein the page layout attributes are preserved by:
(a) extracting from the document absolute position information relating to the text characters;
(b) analyzing the absolute position information to identify relative position information, including the beginning and end of individual words, lines of text and paragraphs of text; and
(c) generating HTML tags, from the relative position information, to delineate the beginning and end of individual lines of text and paragraphs of text.
10. The method of claim 1 wherein the page layout attributes of the document include diagonal text, and wherein the page layout attributes are preserved by:
(a) detecting the presence of diagonal text while parsing the document;
(b) generating, via a geometric transformation, a rotated glyph corresponding to each text character of the diagonal text; and
(c) mapping, to each rotated glyph, vertical position information that enables the client's web browser to render the diagonal text.
11. The method of claim 10 wherein the presence of diagonal text is detected by extracting from the document absolute position information relating to the text characters, and identifying periodically increasing or decreasing vertical offsets of adjacent text characters.
12. A method for displaying text in a web page using the built-in functionality of a web browser, while inhibiting the use of that functionality to search for and copy the text, the method including the following steps:
(a) generating a font file containing, for each text character, a corresponding glyph representing the appearance of that character, and a mismatched character code that does not correspond to the glyph; and
(b) generating an HTML document that contains the mismatched character codes and instructions directing the web browser to use the font file for displaying the glyphs,
(c) whereby the web browser utilizes the font file to display the text correctly, but cannot search for or copy the text due to the mismatched character codes in the HTML document.
13. A system that converts and integrates non-HTML documents into HTML web pages on a host server while preserving the original appearance and text searchability of the documents, the system comprising:
(a) a document parser that extracts text characters and associated fonts, as well as page layout attributes of the document, each glyph in a font representing the appearance of its associated text character with respect to that font;
(b) an HTML converter that integrates the text characters into an HTML web page and generates HTML tags to preserve the document's page layout attributes;
(c) a font file generator that generates one or more font files, accessible from the HTML web page, that map the text characters to their associated glyphs; and
(d) a website host on the host server that stores the HTML web page and font files for delivery to and rendering within the window of a client web browser, whereby the original appearance and text searchability of the document is preserved.
14. The system of claim 13 wherein the CSS 3 @font-face tag is employed to link the font files to the HTML web page.
15. The system of claim 13 wherein the HTML web page contains a plurality of web page elements external to the document, and wherein the document and the plurality of web page elements can be displayed within the client's web browser window.
16. The system of claim 13 wherein a user of the client's web browser can select and search for text within the document using the client web browser's standard controls.
17. The system of claim 13 wherein a user of the client's web browser can zoom text within the document and scroll among the pages of the document using the client web browser's standard controls.
18. The system of claim 15 wherein the plurality of web page elements include an advertisement external to the document.
19. The system of claim 18 wherein the advertisement is located to the side of a page of the document, whereby the ad inventory of the web page is proportional to the number of pages of the document.
US13/278,176 2010-04-20 2011-10-20 Integrated document viewer Abandoned US20120042236A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/278,176 US20120042236A1 (en) 2010-04-20 2011-10-20 Integrated document viewer
US13/343,695 US8707164B2 (en) 2010-04-20 2012-01-04 Integrated document viewer

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US32616610P 2010-04-20 2010-04-20
US33016110P 2010-04-30 2010-04-30
US12/912,625 US20110258535A1 (en) 2010-04-20 2010-10-26 Integrated document viewer with automatic sharing of reading-related activities across external social networks
US201113189372A 2011-07-22 2011-07-22
US13/278,176 US20120042236A1 (en) 2010-04-20 2011-10-20 Integrated document viewer

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US12/912,625 Division US20110258535A1 (en) 2010-04-20 2010-10-26 Integrated document viewer with automatic sharing of reading-related activities across external social networks
US201113235362A Division 2010-04-20 2011-09-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/343,695 Division US8707164B2 (en) 2010-04-20 2012-01-04 Integrated document viewer

Publications (1)

Publication Number Publication Date
US20120042236A1 true US20120042236A1 (en) 2012-02-16

Family

ID=44789146

Family Applications (3)

Application Number Title Priority Date Filing Date
US12/912,625 Abandoned US20110258535A1 (en) 2010-04-20 2010-10-26 Integrated document viewer with automatic sharing of reading-related activities across external social networks
US13/278,176 Abandoned US20120042236A1 (en) 2010-04-20 2011-10-20 Integrated document viewer
US13/343,695 Expired - Fee Related US8707164B2 (en) 2010-04-20 2012-01-04 Integrated document viewer

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/912,625 Abandoned US20110258535A1 (en) 2010-04-20 2010-10-26 Integrated document viewer with automatic sharing of reading-related activities across external social networks

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/343,695 Expired - Fee Related US8707164B2 (en) 2010-04-20 2012-01-04 Integrated document viewer

Country Status (1)

Country Link
US (3) US20110258535A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110150362A1 (en) * 2009-09-10 2011-06-23 Motorola Mobility, Inc. Method of exchanging photos with interface content provider website
US20120203847A1 (en) * 2007-11-05 2012-08-09 Kendall Timothy A Sponsored Stories and News Stories within a Newsfeed of a Social Networking System
US20130174017A1 (en) * 2011-12-29 2013-07-04 Chegg, Inc. Document Content Reconstruction
US8589516B2 (en) 2009-09-10 2013-11-19 Motorola Mobility Llc Method and system for intermediating content provider website and mobile device
US9037656B2 (en) 2010-12-20 2015-05-19 Google Technology Holdings LLC Method and system for facilitating interaction with multiple content provider websites
US9435801B2 (en) 2012-05-18 2016-09-06 Blackberry Limited Systems and methods to manage zooming
US9542538B2 (en) * 2011-10-04 2017-01-10 Chegg, Inc. Electronic content management and delivery platform
US9569410B2 (en) 2012-08-13 2017-02-14 Chegg, Inc. Multilayered document distribution in multiscreen systems
US9898547B1 (en) 2014-05-02 2018-02-20 Tribune Publishing Company, Llc Online information system with backward continuous scrolling
US9990652B2 (en) 2010-12-15 2018-06-05 Facebook, Inc. Targeting social advertising to friends of users who have interacted with an object associated with the advertising
WO2019169205A1 (en) * 2018-02-28 2019-09-06 Rocky Kahn Document viewer aligning pdf and xml
US10585550B2 (en) 2007-11-05 2020-03-10 Facebook, Inc. Sponsored story creation user interface
WO2020236997A1 (en) * 2019-05-21 2020-11-26 Schlumberger Technology Corporation Process for highlighting text with varied orientation
US20220129618A1 (en) * 2020-10-23 2022-04-28 Saudi Arabian Oil Company Text scrambling/descrambling

Families Citing this family (149)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8326814B2 (en) 2007-12-05 2012-12-04 Box, Inc. Web-based file management system and service
US7930447B2 (en) 2008-10-17 2011-04-19 International Business Machines Corporation Listing windows of active applications of computing devices sharing a keyboard based upon requests for attention
US8347208B2 (en) * 2009-03-04 2013-01-01 Microsoft Corporation Content rendering on a computer
US8615709B2 (en) * 2010-04-29 2013-12-24 Monotype Imaging Inc. Initiating font subsets
US8504423B2 (en) * 2010-08-27 2013-08-06 Snap Services, Llc Social network appreciation platform
WO2012056326A2 (en) * 2010-10-27 2012-05-03 Google Inc. Social discovery of user activity for media content
US9251123B2 (en) * 2010-11-29 2016-02-02 Hewlett-Packard Development Company, L.P. Systems and methods for converting a PDF file
US8977979B2 (en) * 2010-12-06 2015-03-10 International Business Machines Corporation Social network relationship mapping
AU2011352131A1 (en) 2010-12-28 2013-07-11 Google Inc. Targeting based on social updates
US8504910B2 (en) * 2011-01-07 2013-08-06 Facebook, Inc. Mapping a third-party web page to an object in a social networking system
WO2012099617A1 (en) 2011-01-20 2012-07-26 Box.Net, Inc. Real time notification of activities that occur in a web-based collaboration environment
CA3043598C (en) * 2011-01-27 2021-07-20 Google Llc Content access control in social network
US9021113B2 (en) * 2011-06-17 2015-04-28 International Business Machines Corporation Inter-service sharing of content between users from different social networks
US9015601B2 (en) 2011-06-21 2015-04-21 Box, Inc. Batch uploading of content to a web-based collaboration environment
US9063912B2 (en) 2011-06-22 2015-06-23 Box, Inc. Multimedia content preview rendering in a cloud content management system
US9652741B2 (en) 2011-07-08 2017-05-16 Box, Inc. Desktop application for access and interaction with workspaces in a cloud-based content management system and synchronization mechanisms thereof
WO2013009328A2 (en) 2011-07-08 2013-01-17 Box.Net, Inc. Collaboration sessions in a workspace on cloud-based content management system
US9197718B2 (en) 2011-09-23 2015-11-24 Box, Inc. Central management and control of user-contributed content in a web-based collaboration environment and management console thereof
WO2013055804A1 (en) * 2011-10-10 2013-04-18 Brightedge Technologies, Inc. Auditing of webpages
US8515902B2 (en) 2011-10-14 2013-08-20 Box, Inc. Automatic and semi-automatic tagging features of work items in a shared workspace for metadata tracking in a cloud-based content management system with selective or optional user contribution
US10108928B2 (en) 2011-10-18 2018-10-23 Dotloop, Llc Systems, methods and apparatus for form building
US8862602B1 (en) * 2011-10-25 2014-10-14 Google Inc. Systems and methods for improved readability of URLs
US9098474B2 (en) 2011-10-26 2015-08-04 Box, Inc. Preview pre-generation based on heuristics and algorithmic prediction/assessment of predicted user behavior for enhancement of user experience
WO2013062599A1 (en) 2011-10-26 2013-05-02 Box, Inc. Enhanced multimedia content preview rendering in a cloud content management system
US8990307B2 (en) 2011-11-16 2015-03-24 Box, Inc. Resource effective incremental updating of a remote client with events which occurred via a cloud-enabled platform
GB2500152A (en) 2011-11-29 2013-09-11 Box Inc Mobile platform file and folder selection functionalities for offline access and synchronization
EP2602723A1 (en) * 2011-12-08 2013-06-12 ExB Asset Management GmbH Asynchronous, passive knowledge sharing system and method
US9348479B2 (en) 2011-12-08 2016-05-24 Microsoft Technology Licensing, Llc Sentiment aware user interface customization
US9378290B2 (en) 2011-12-20 2016-06-28 Microsoft Technology Licensing, Llc Scenario-adaptive input method editor
US9019123B2 (en) 2011-12-22 2015-04-28 Box, Inc. Health check services for web-based collaboration environments
US9904435B2 (en) 2012-01-06 2018-02-27 Box, Inc. System and method for actionable event generation for task delegation and management via a discussion forum in a web-based collaboration environment
EP2613270A1 (en) * 2012-01-09 2013-07-10 Research In Motion Limited Selective rendering of electronic messages by an electronic device
US11232481B2 (en) 2012-01-30 2022-01-25 Box, Inc. Extended applications of multimedia content previews in the cloud-based content management system
US10049168B2 (en) 2012-01-31 2018-08-14 Openwave Mobility, Inc. Systems and methods for modifying webpage data
US9965745B2 (en) 2012-02-24 2018-05-08 Box, Inc. System and method for promoting enterprise adoption of a web-based collaboration environment
US9195636B2 (en) 2012-03-07 2015-11-24 Box, Inc. Universal file type preview for mobile devices
US9054919B2 (en) 2012-04-05 2015-06-09 Box, Inc. Device pinning capability for enterprise cloud service and storage accounts
US9575981B2 (en) 2012-04-11 2017-02-21 Box, Inc. Cloud service enabled to handle a set of files depicted to a user as a single file in a native operating system
CN103379135B (en) * 2012-04-17 2015-11-11 腾讯科技(深圳)有限公司 A kind of information sharing method and device
US9317623B2 (en) * 2012-04-20 2016-04-19 Yahoo! Inc. Dynamic webpage image
US9195840B2 (en) 2012-04-23 2015-11-24 Google Inc. Application-specific file type generation and use
US9176720B1 (en) 2012-04-23 2015-11-03 Google Inc. Installation of third-party web applications into a container
US9413587B2 (en) 2012-05-02 2016-08-09 Box, Inc. System and method for a third-party application to access content within a cloud-based platform
US20150193386A1 (en) * 2012-05-03 2015-07-09 David Adam Wurtz System and Method of Facilitating Font Selection and Manipulation of Fonts
US9396216B2 (en) 2012-05-04 2016-07-19 Box, Inc. Repository redundancy implementation of a system which incrementally updates clients with events that occurred via a cloud-enabled platform
US9691051B2 (en) 2012-05-21 2017-06-27 Box, Inc. Security enhancement through application access control
US8914900B2 (en) 2012-05-23 2014-12-16 Box, Inc. Methods, architectures and security mechanisms for a third-party application to access content in a cloud-based platform
US9027108B2 (en) 2012-05-23 2015-05-05 Box, Inc. Systems and methods for secure file portability between mobile applications on a mobile device
US9372644B2 (en) 2012-05-29 2016-06-21 Hewlett-Packard Development Company, L.P. Sending a job processing notice to a social network contact
US20130339830A1 (en) * 2012-06-15 2013-12-19 Microsoft Corporation Optimized document views for mobile device interfaces
CN104428734A (en) 2012-06-25 2015-03-18 微软公司 Input method editor application platform
US9317709B2 (en) 2012-06-26 2016-04-19 Google Inc. System and method for detecting and integrating with native applications enabled for web-based storage
US9021099B2 (en) 2012-07-03 2015-04-28 Box, Inc. Load balancing secure FTP connections among multiple FTP servers
GB2505072A (en) 2012-07-06 2014-02-19 Box Inc Identifying users and collaborators as search results in a cloud-based system
US9712510B2 (en) 2012-07-06 2017-07-18 Box, Inc. Systems and methods for securely submitting comments among users via external messaging applications in a cloud-based platform
US9792320B2 (en) 2012-07-06 2017-10-17 Box, Inc. System and method for performing shard migration to support functions of a cloud-based service
US9473532B2 (en) 2012-07-19 2016-10-18 Box, Inc. Data loss prevention (DLP) methods by a cloud service including third party integration architectures
US8868574B2 (en) 2012-07-30 2014-10-21 Box, Inc. System and method for advanced search and filtering mechanisms for enterprise administrators in a cloud-based environment
US9794256B2 (en) 2012-07-30 2017-10-17 Box, Inc. System and method for advanced control tools for administrators in a cloud-based service
US8959109B2 (en) 2012-08-06 2015-02-17 Microsoft Corporation Business intelligent in-document suggestions
US9369520B2 (en) 2012-08-19 2016-06-14 Box, Inc. Enhancement of upload and/or download performance based on client and/or server feedback information
US8745267B2 (en) 2012-08-19 2014-06-03 Box, Inc. Enhancement of upload and/or download performance based on client and/or server feedback information
US9558202B2 (en) 2012-08-27 2017-01-31 Box, Inc. Server side techniques for reducing database workload in implementing selective subfolder synchronization in a cloud-based environment
US9135462B2 (en) 2012-08-29 2015-09-15 Box, Inc. Upload and download streaming encryption to/from a cloud-based platform
US9767156B2 (en) 2012-08-30 2017-09-19 Microsoft Technology Licensing, Llc Feature-based candidate selection
US9311071B2 (en) 2012-09-06 2016-04-12 Box, Inc. Force upgrade of a mobile application via a server side configuration file
US9195519B2 (en) 2012-09-06 2015-11-24 Box, Inc. Disabling the self-referential appearance of a mobile application in an intent via a background registration
US9117087B2 (en) 2012-09-06 2015-08-25 Box, Inc. System and method for creating a secure channel for inter-application communication based on intents
US9292833B2 (en) 2012-09-14 2016-03-22 Box, Inc. Batching notifications of activities that occur in a web-based collaboration environment
US10200256B2 (en) 2012-09-17 2019-02-05 Box, Inc. System and method of a manipulative handle in an interactive mobile user interface
US9553758B2 (en) 2012-09-18 2017-01-24 Box, Inc. Sandboxing individual applications to specific user folders in a cloud-based service
US10915492B2 (en) 2012-09-19 2021-02-09 Box, Inc. Cloud-based platform enabled with media content indexed for text-based searches and/or metadata extraction
US9959420B2 (en) 2012-10-02 2018-05-01 Box, Inc. System and method for enhanced security and management mechanisms for enterprise administrators in a cloud-based environment
US9705967B2 (en) 2012-10-04 2017-07-11 Box, Inc. Corporate user discovery and identification of recommended collaborators in a cloud platform
US9495364B2 (en) 2012-10-04 2016-11-15 Box, Inc. Enhanced quick search features, low-barrier commenting/interactive features in a collaboration platform
US9665349B2 (en) 2012-10-05 2017-05-30 Box, Inc. System and method for generating embeddable widgets which enable access to a cloud-based collaboration platform
US9471550B2 (en) * 2012-10-16 2016-10-18 Linkedin Corporation Method and apparatus for document conversion with font metrics adjustment for format compatibility
JP5982343B2 (en) 2012-10-17 2016-08-31 ボックス インコーポレイテッドBox, Inc. Remote key management in a cloud-based environment
US9756022B2 (en) 2014-08-29 2017-09-05 Box, Inc. Enhanced remote key management for an enterprise in a cloud-based environment
US9507491B2 (en) 2012-12-14 2016-11-29 International Business Machines Corporation Search engine optimization utilizing scrolling fixation
US10235383B2 (en) 2012-12-19 2019-03-19 Box, Inc. Method and apparatus for synchronization of items with read-only permissions in a cloud-based environment
US9396245B2 (en) 2013-01-02 2016-07-19 Box, Inc. Race condition handling in a system which incrementally updates clients with events that occurred in a cloud-based collaboration platform
US9953036B2 (en) 2013-01-09 2018-04-24 Box, Inc. File system monitoring in a system which incrementally updates clients with events that occurred in a cloud-based collaboration platform
EP2755151A3 (en) 2013-01-11 2014-09-24 Box, Inc. Functionalities, features and user interface of a synchronization client to a cloud-based environment
EP2757491A1 (en) 2013-01-17 2014-07-23 Box, Inc. Conflict resolution, retry condition management, and handling of problem files for the synchronization client to a cloud-based platform
EP2763051B1 (en) * 2013-01-31 2019-08-14 Google LLC Serving font glyphs
US10826951B2 (en) 2013-02-11 2020-11-03 Dotloop, Llc Electronic content sharing
US9484006B2 (en) * 2013-02-13 2016-11-01 Documill Oy Manipulation of textual content data for layered presentation
US20140281928A1 (en) * 2013-03-12 2014-09-18 Sap Portals Israel Ltd. Content-driven layout
US8762836B1 (en) 2013-03-13 2014-06-24 Axure Software Solutions, Inc. Application of a system font mapping to a design
US9430578B2 (en) 2013-03-15 2016-08-30 Google Inc. System and method for anchoring third party metadata in a document
US9727577B2 (en) 2013-03-28 2017-08-08 Google Inc. System and method to store third-party metadata in a cloud storage system
US9575622B1 (en) 2013-04-02 2017-02-21 Dotloop, Llc Systems and methods for electronic signature
US10725968B2 (en) 2013-05-10 2020-07-28 Box, Inc. Top down delete or unsynchronization on delete of and depiction of item synchronization with a synchronization client to a cloud-based platform
US10846074B2 (en) 2013-05-10 2020-11-24 Box, Inc. Identification and handling of items to be ignored for synchronization with a cloud-based platform by a synchronization client
US9633037B2 (en) 2013-06-13 2017-04-25 Box, Inc Systems and methods for synchronization event building and/or collapsing by a synchronization component of a cloud-based platform
US9805050B2 (en) 2013-06-21 2017-10-31 Box, Inc. Maintaining and updating file system shadows on a local device by a synchronization client of a cloud-based platform
US10110656B2 (en) 2013-06-25 2018-10-23 Box, Inc. Systems and methods for providing shell communication in a cloud-based platform
US10229134B2 (en) 2013-06-25 2019-03-12 Box, Inc. Systems and methods for managing upgrades, migration of user data and improving performance of a cloud-based platform
US9317489B2 (en) * 2013-06-27 2016-04-19 Adobe Systems Incorporated Vector graphic conversion into fonts
US9535924B2 (en) 2013-07-30 2017-01-03 Box, Inc. Scalability improvement in a system which incrementally updates clients with events that occurred in a cloud-based collaboration platform
EP3030982A4 (en) 2013-08-09 2016-08-03 Microsoft Technology Licensing Llc Input method editor providing language assistance
US8892679B1 (en) 2013-09-13 2014-11-18 Box, Inc. Mobile device, methods and user interfaces thereof in a mobile device platform featuring multifunctional access and engagement in a collaborative environment provided by a cloud-based platform
US9213684B2 (en) 2013-09-13 2015-12-15 Box, Inc. System and method for rendering document in web browser or mobile device regardless of third-party plug-in software
US10509527B2 (en) 2013-09-13 2019-12-17 Box, Inc. Systems and methods for configuring event-based automation in cloud-based collaboration platforms
GB2518298A (en) 2013-09-13 2015-03-18 Box Inc High-availability architecture for a cloud-based concurrent-access collaboration platform
US9704137B2 (en) 2013-09-13 2017-07-11 Box, Inc. Simultaneous editing/accessing of content by collaborator invitation through a web-based or mobile application to a cloud-based collaboration platform
US9535909B2 (en) 2013-09-13 2017-01-03 Box, Inc. Configurable event-based automation architecture for cloud-based collaboration platforms
US10866931B2 (en) 2013-10-22 2020-12-15 Box, Inc. Desktop application for accessing a cloud collaboration platform
US20150131126A1 (en) * 2013-11-11 2015-05-14 Zsunami, Inc. Print management system and method
US10552525B1 (en) 2014-02-12 2020-02-04 Dotloop, Llc Systems, methods and apparatuses for automated form templating
US10075484B1 (en) * 2014-03-13 2018-09-11 Issuu, Inc. Sharable clips for digital publications
RU2648636C2 (en) * 2014-03-31 2018-03-26 Общество с ограниченной ответственностью "Аби Девелопмент" Storage of the content in converted documents
US10055386B2 (en) * 2014-04-18 2018-08-21 Emc Corporation Using server side font preparation to achieve WYSIWYG and cross platform fidelity on web based word processor
CN105022616B (en) * 2014-04-23 2019-12-03 腾讯科技(北京)有限公司 A kind of method and device generating Webpage
US10530854B2 (en) 2014-05-30 2020-01-07 Box, Inc. Synchronization of permissioned content in cloud-based environments
US9602514B2 (en) 2014-06-16 2017-03-21 Box, Inc. Enterprise mobility management and verification of a managed application by a content provider
US20160012024A1 (en) * 2014-07-08 2016-01-14 Cognizant Technology Solutions India Pvt. Ltd. Method and system for automatic generation and validation of html5 compliant scripts
US9148494B1 (en) 2014-07-15 2015-09-29 Workiva Inc. Font loading system and method in a client-server architecture
TW201608384A (en) 2014-08-29 2016-03-01 萬國商業機器公司 Computer-implemented method for remotely providing fonts for an electronic document
US9894119B2 (en) 2014-08-29 2018-02-13 Box, Inc. Configurable metadata-based automation and content classification architecture for cloud-based collaboration platforms
US10038731B2 (en) 2014-08-29 2018-07-31 Box, Inc. Managing flow-based interactions with cloud-based shared content
US10574442B2 (en) 2014-08-29 2020-02-25 Box, Inc. Enhanced remote key management for an enterprise in a cloud-based environment
US10733364B1 (en) 2014-09-02 2020-08-04 Dotloop, Llc Simplified form interface system and method
US9639511B2 (en) 2014-11-24 2017-05-02 Google Inc. Systems and methods for editing a file in a non-native application using an application engine
US9703762B1 (en) * 2014-12-30 2017-07-11 Open Text Corporation Method and system for processing a style sheet defining a reusable theme for a web page and specifying a relative location of content
US10115215B2 (en) 2015-04-17 2018-10-30 Monotype Imaging Inc. Pairing fonts for presentation
US11537262B1 (en) 2015-07-21 2022-12-27 Monotype Imaging Inc. Using attributes for font recommendations
US10839149B2 (en) 2016-02-01 2020-11-17 Microsoft Technology Licensing, Llc. Generating templates from user's past documents
US9922022B2 (en) * 2016-02-01 2018-03-20 Microsoft Technology Licensing, Llc. Automatic template generation based on previous documents
JP6736440B2 (en) * 2016-09-27 2020-08-05 キヤノン株式会社 Device, method and program for providing document file generation service in cloud system
CN106502968A (en) * 2016-10-12 2017-03-15 北京奇虎科技有限公司 The method and device of data processing
US10614152B2 (en) * 2016-10-13 2020-04-07 Microsoft Technology Licensing, Llc Exposing formatting properties of content for accessibility
IL248651A0 (en) * 2016-10-31 2017-02-28 Doubledu Ltd System and method for on-the-fly conversion of non-accessible online documents to accessible documents
AU2016266083A1 (en) * 2016-12-02 2018-06-21 Canon Kabushiki Kaisha Method, system and apparatus for displaying an electronic document
CN109032917B (en) * 2017-06-09 2021-06-18 北京金山云网络技术有限公司 Page debugging method and system, mobile terminal and computer terminal
CN107862729B (en) * 2017-08-24 2021-07-02 平安普惠企业管理有限公司 Hierarchical animation generation method, terminal and readable storage medium
US11334750B2 (en) 2017-09-07 2022-05-17 Monotype Imaging Inc. Using attributes for predicting imagery performance
US10909429B2 (en) 2017-09-27 2021-02-02 Monotype Imaging Inc. Using attributes for identifying imagery for selection
US11657602B2 (en) 2017-10-30 2023-05-23 Monotype Imaging Inc. Font identification from imagery
US11295060B2 (en) * 2017-12-12 2022-04-05 Google Llc Managing comments on binary files preview view in a cloud-based environment
CN109033466B (en) * 2018-08-31 2019-12-03 掌阅科技股份有限公司 Page sharing method calculates equipment and computer storage medium
US11599325B2 (en) * 2019-01-03 2023-03-07 Bluebeam, Inc. Systems and methods for synchronizing graphical displays across devices
US11538123B1 (en) 2019-01-23 2022-12-27 Wells Fargo Bank, N.A. Document review and execution on mobile devices
US11223663B1 (en) * 2020-07-01 2022-01-11 Adobe Inc. Providing personalized chat communications within portable document format documents
CN112818273A (en) * 2021-02-05 2021-05-18 深圳市世强元件网络有限公司 Method for converting PDF file into HTML embedded picture and computer equipment
CN112818274B (en) * 2021-02-05 2024-03-19 深圳市世强元件网络有限公司 Method for converting PDF file into paging HTML file and computer equipment
CN115033313A (en) * 2021-02-24 2022-09-09 华为技术有限公司 Terminal application control method, terminal equipment and chip system
CN116702747A (en) * 2023-05-30 2023-09-05 珠海盈米基金销售有限公司 PDF online reader design method, device, computer equipment and medium

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6288726B1 (en) * 1997-06-27 2001-09-11 Microsoft Corporation Method for rendering glyphs using a layout services library
US6336124B1 (en) * 1998-10-01 2002-01-01 Bcl Computers, Inc. Conversion data representing a document to other formats for manipulation and display
US20030028801A1 (en) * 2001-04-12 2003-02-06 Copyseal Pty Ltd., An Australian Corporation System and method for preventing unauthorized copying of electronic documents
US20040025118A1 (en) * 2002-07-31 2004-02-05 Renner John S. Glyphlets
US20040181746A1 (en) * 2003-03-14 2004-09-16 Mclure Petra Method and expert system for document conversion
US20040205508A1 (en) * 2002-03-05 2004-10-14 Microsoft Corporation Content replacement in electronically-provided archived material
US6973618B2 (en) * 2000-12-29 2005-12-06 International Business Machines Corporation Method and system for importing MS office forms
US6993662B2 (en) * 1998-06-14 2006-01-31 Finjan Software Ltd. Method and system for copy protection of displayed data content
US7100069B1 (en) * 1996-02-16 2006-08-29 G&H Nevada-Tek Method and apparatus for controlling a computer over a wide area network
US20070055934A1 (en) * 2001-07-16 2007-03-08 Adamson Robert G Iii Allowing operating system access to non-standard fonts in a network document
US20070055933A1 (en) * 2005-09-02 2007-03-08 Xerox Corporation Text correction for PDF converters
US7228501B2 (en) * 2002-11-01 2007-06-05 Microsoft Corporation Method for selecting a font
US20080115046A1 (en) * 2006-11-15 2008-05-15 Fujitsu Limited Program, copy and paste processing method, apparatus, and storage medium
US20080301431A1 (en) * 2007-06-01 2008-12-04 Hea Young Sun Text security method
US20100077320A1 (en) * 2008-09-19 2010-03-25 United States Government As Represented By The Secretary Of The Navy SGML/XML to HTML conversion system and method for frame-based viewer
US20110093565A1 (en) * 2009-10-16 2011-04-21 Extensis Inc. Serving Font Files in Varying Formats Based on User Agent Type
US20110213655A1 (en) * 2009-01-24 2011-09-01 Kontera Technologies, Inc. Hybrid contextual advertising and related content analysis and display techniques

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6167409A (en) * 1996-03-01 2000-12-26 Enigma Information Systems Ltd. Computer system and method for customizing context information sent with document fragments across a computer network
GB2315140A (en) * 1996-07-11 1998-01-21 Ibm Multi-layered HTML documents
US6182092B1 (en) * 1997-07-14 2001-01-30 Microsoft Corporation Method and system for converting between structured language elements and objects embeddable in a document
US6393442B1 (en) * 1998-05-08 2002-05-21 International Business Machines Corporation Document format transforations for converting plurality of documents which are consistent with each other
US20010029582A1 (en) 1999-05-17 2001-10-11 Goodman Daniel Isaac Method and system for copy protection of data content
US7359944B2 (en) * 2001-02-07 2008-04-15 Lg Electronics Inc. Method of providing digital electronic book
US20050177784A1 (en) * 2002-06-19 2005-08-11 Andrews Richard L. Creating an html document from a source document
US7653876B2 (en) * 2003-04-07 2010-01-26 Adobe Systems Incorporated Reversible document format
US20050132305A1 (en) * 2003-12-12 2005-06-16 Guichard Robert D. Electronic information access systems, methods for creation and related commercial models
US20050188311A1 (en) * 2003-12-31 2005-08-25 Automatic E-Learning, Llc System and method for implementing an electronic presentation
US7620902B2 (en) * 2005-04-20 2009-11-17 Microsoft Corporation Collaboration spaces
US20110313899A1 (en) * 2006-01-05 2011-12-22 Drey Leonard L Method of Governing Content Presentation
US8166061B2 (en) * 2006-01-10 2012-04-24 Aol Inc. Searching recent content publication activity
US7886226B1 (en) * 2006-10-03 2011-02-08 Adobe Systems Incorporated Content based Ad display control
US8612847B2 (en) * 2006-10-03 2013-12-17 Adobe Systems Incorporated Embedding rendering interface
US9602605B2 (en) * 2007-10-26 2017-03-21 Facebook, Inc. Sharing digital content on a social network
US7941535B2 (en) * 2008-05-07 2011-05-10 Doug Sherrets System for targeting third party content to users based on social networks
US7958193B2 (en) * 2008-06-27 2011-06-07 Microsoft Corporation Social network notifications for external updates
US8856647B2 (en) * 2009-02-20 2014-10-07 Microsoft Corporation Font handling for viewing documents on the web
US20110161791A1 (en) * 2009-12-31 2011-06-30 Travis Amy D Method and system for notification of recent activity on a website
US8438648B2 (en) * 2010-02-16 2013-05-07 Celartem, Inc. Preventing unauthorized font linking

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7100069B1 (en) * 1996-02-16 2006-08-29 G&H Nevada-Tek Method and apparatus for controlling a computer over a wide area network
US6288726B1 (en) * 1997-06-27 2001-09-11 Microsoft Corporation Method for rendering glyphs using a layout services library
US6993662B2 (en) * 1998-06-14 2006-01-31 Finjan Software Ltd. Method and system for copy protection of displayed data content
US6336124B1 (en) * 1998-10-01 2002-01-01 Bcl Computers, Inc. Conversion data representing a document to other formats for manipulation and display
US6973618B2 (en) * 2000-12-29 2005-12-06 International Business Machines Corporation Method and system for importing MS office forms
US20030028801A1 (en) * 2001-04-12 2003-02-06 Copyseal Pty Ltd., An Australian Corporation System and method for preventing unauthorized copying of electronic documents
US20070055934A1 (en) * 2001-07-16 2007-03-08 Adamson Robert G Iii Allowing operating system access to non-standard fonts in a network document
US20040205508A1 (en) * 2002-03-05 2004-10-14 Microsoft Corporation Content replacement in electronically-provided archived material
US20040025118A1 (en) * 2002-07-31 2004-02-05 Renner John S. Glyphlets
US7228501B2 (en) * 2002-11-01 2007-06-05 Microsoft Corporation Method for selecting a font
US20040181746A1 (en) * 2003-03-14 2004-09-16 Mclure Petra Method and expert system for document conversion
US20070055933A1 (en) * 2005-09-02 2007-03-08 Xerox Corporation Text correction for PDF converters
US20080115046A1 (en) * 2006-11-15 2008-05-15 Fujitsu Limited Program, copy and paste processing method, apparatus, and storage medium
US20080301431A1 (en) * 2007-06-01 2008-12-04 Hea Young Sun Text security method
US20100077320A1 (en) * 2008-09-19 2010-03-25 United States Government As Represented By The Secretary Of The Navy SGML/XML to HTML conversion system and method for frame-based viewer
US20110213655A1 (en) * 2009-01-24 2011-09-01 Kontera Technologies, Inc. Hybrid contextual advertising and related content analysis and display techniques
US20110093565A1 (en) * 2009-10-16 2011-04-21 Extensis Inc. Serving Font Files in Varying Formats Based on User Agent Type

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Hardmeierm, Sandi, "Zoom - Internet Explorer 7", 12/17/2008, IE-Vista, p1-2; http://www.ie-vista.com/zoom.html *
v-render, "Using css3 @font-face", 2/27/2010, v-render studio, p1-5 + 1 page from Wayback; http://web.archive.org/web/20100227050531/http://www.v-render.co.in/2010/02/24/using-css3-font-face-property/ *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9984391B2 (en) * 2007-11-05 2018-05-29 Facebook, Inc. Social advertisements and other informational messages on a social networking website, and advertising model for same
US20120203847A1 (en) * 2007-11-05 2012-08-09 Kendall Timothy A Sponsored Stories and News Stories within a Newsfeed of a Social Networking System
US10585550B2 (en) 2007-11-05 2020-03-10 Facebook, Inc. Sponsored story creation user interface
US10068258B2 (en) * 2007-11-05 2018-09-04 Facebook, Inc. Sponsored stories and news stories within a newsfeed of a social networking system
US9984392B2 (en) 2007-11-05 2018-05-29 Facebook, Inc. Social advertisements and other informational messages on a social networking website, and advertising model for same
US8589516B2 (en) 2009-09-10 2013-11-19 Motorola Mobility Llc Method and system for intermediating content provider website and mobile device
US8990338B2 (en) * 2009-09-10 2015-03-24 Google Technology Holdings LLC Method of exchanging photos with interface content provider website
US9026581B2 (en) 2009-09-10 2015-05-05 Google Technology Holdings LLC Mobile device and method of operating same to interface content provider website
US20110150362A1 (en) * 2009-09-10 2011-06-23 Motorola Mobility, Inc. Method of exchanging photos with interface content provider website
US9450994B2 (en) 2009-09-10 2016-09-20 Google Technology Holdings LLC Mobile device and method of operating same to interface content provider website
US9990652B2 (en) 2010-12-15 2018-06-05 Facebook, Inc. Targeting social advertising to friends of users who have interacted with an object associated with the advertising
US9037656B2 (en) 2010-12-20 2015-05-19 Google Technology Holdings LLC Method and system for facilitating interaction with multiple content provider websites
US9542538B2 (en) * 2011-10-04 2017-01-10 Chegg, Inc. Electronic content management and delivery platform
US20130174017A1 (en) * 2011-12-29 2013-07-04 Chegg, Inc. Document Content Reconstruction
US9098471B2 (en) * 2011-12-29 2015-08-04 Chegg, Inc. Document content reconstruction
US9435801B2 (en) 2012-05-18 2016-09-06 Blackberry Limited Systems and methods to manage zooming
US9569410B2 (en) 2012-08-13 2017-02-14 Chegg, Inc. Multilayered document distribution in multiscreen systems
US9898547B1 (en) 2014-05-02 2018-02-20 Tribune Publishing Company, Llc Online information system with backward continuous scrolling
US9934207B1 (en) * 2014-05-02 2018-04-03 Tribune Publishing Company, Llc Online information system with continuous scrolling and previous section removal
US10146421B1 (en) 2014-05-02 2018-12-04 Tribune Publishing Company, Llc Online information system with per-document selectable items
US9971846B1 (en) 2014-05-02 2018-05-15 Tribune Publishing Company, Llc Online information system with continuous scrolling and user-controlled content
WO2019169205A1 (en) * 2018-02-28 2019-09-06 Rocky Kahn Document viewer aligning pdf and xml
GB2587923A (en) * 2018-02-28 2021-04-14 Kahn Rocky Document viewer aligning PDF and XML
WO2020236997A1 (en) * 2019-05-21 2020-11-26 Schlumberger Technology Corporation Process for highlighting text with varied orientation
US11727191B2 (en) 2019-05-21 2023-08-15 Schlumberger Technology Corporation Process for highlighting text with varied orientation
US20220129618A1 (en) * 2020-10-23 2022-04-28 Saudi Arabian Oil Company Text scrambling/descrambling
US11886794B2 (en) * 2020-10-23 2024-01-30 Saudi Arabian Oil Company Text scrambling/descrambling

Also Published As

Publication number Publication date
US20110258535A1 (en) 2011-10-20
US20120110436A1 (en) 2012-05-03
US8707164B2 (en) 2014-04-22

Similar Documents

Publication Publication Date Title
US8707164B2 (en) Integrated document viewer
US8910036B1 (en) Web based copy protection
US9864482B2 (en) Method of navigating through digital content
CN102439588B (en) Serving font glyphs
US8494287B2 (en) Character identification through glyph data matching
EP2687997A1 (en) Method for rearranging web page
US20210149842A1 (en) System and method for display of document comparisons on a remote device
US9870484B2 (en) Document redaction
JP4248411B2 (en) Method, system, computer program and storage device for displaying a document
US8453051B1 (en) Dynamic display dependent markup language interface
US20190073342A1 (en) Presentation of electronic information
US10902193B2 (en) Automated generation of web forms using fillable electronic documents
CN103608770A (en) Embedded web viewer for presentation applications
US9749440B2 (en) Systems and methods for hosted application marketplaces
CN103336794B (en) For providing the corresponding method and apparatus that information is presented in target pages
CN105005472B (en) The method and device of Uyghur Character is shown on a kind of WEB
US20150317406A1 (en) Re-Use of Web Page Thematic Elements
Schafer Html, xhtml, and css bible
KR20050052421A (en) Creative method and active viewing method for a electronic document
Sikos Web Standards: Mastering HTML5, CSS3, and XML
US9965446B1 (en) Formatting a content item having a scalable object
KR102087274B1 (en) Web electric document editing apparatus for rendering object and operating method thereof
US9116643B2 (en) Retrieval of electronic document using hardcopy document
JP2006526190A (en) System and method for providing multiple renditions of document content
US20100017708A1 (en) Information output apparatus, information output method, and recording medium

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION