You are here: Foswiki>CDMCore Web>WorkFlow>JNOCompoundObjects (revision 1)EditAttach

- NancyLombardo, 26 May 2010

Steps for preparing JNO archives

The North American Neuro-Ophthalmology Society (NANOS) worked with their publisher Lippincott, Williams & Wilkins to obtain rights to the archives of the Journal of Neuro-Ophthalmology for the purpose of making them a part of NOVEL. As a result, we have received the archives from 1994-2007 and will continue to receive files after they have reached their 12 month embargo. LWW delivered the archives in pdf format which we will use to place them in their own CONTENTdm collection that will be linked from NOVEL. Along with the files themselves, a description for each item will need to be included which will be based on the work already done by those at PubMed?. It will be easiest to manage the preparation and uploading issue by issue. Data will need to be gathered from PubMed?, put into the required CONTENTdm formatting, files renamed and OCRed and then uploaded into CONTENTdm using the Acquisitions Station.

  1. Get the data from PubMed?
    1. Go to pubmed.gov
    2. Click on Single Citation Matcher
    3. Type in the following:
      1. Journal: 'Journal of Neuro-ophthalmology'
      2. Date: start with 1994 (only one issue has been completed)
      3. Volume: start with 14 (this, of course, will increase as each issue is completed)
      4. Issue: start with 2
    4. Click Go
    5. Click on the Send to link on the right side, select File
    6. Choose XML from the Format drop down menu.
    7. Rename using a name that will help you remember the year/vol/issue you're working with (e.g. jno2008-28-2.xml)
  2. Import PubMed? data into Excel file
    1. Open Excel
    2. Go to Data > In the “Get External Data” panel, Click From Other Sources (with down arrow) > From XML Data Import >
    3. Locate File just created >Import >OK
      1. May see a notice stating that “The specified XML source does not refer to a schema. Excel will create schema based on the XML source data.” Click OK.
      2. Where do you want to put the data? > Select XML table in existing worksheet with $A$1 identified as start cell > Click OK
      3. Save after data import
    4. Columns will go from A to BN. You won't need all of these.
    5. You can delete all of them except the following:
      1. Article title
      2. Abstract text
      3. Affiliation
      4. Last name
      5. Fore name
      6. Descriptor name
    6. Many of the rows will be duplicated. It depends on the record having multiple authors. You will have to work within the duplication rather than deleting because you will need the information in the rows to get author names and initials.
    7. Save the file to the desktop
  3. Create a separate Excel file for metadata importing (you will copy and paste data from xml file into this one which will be imported into CONTENTdm)
    1. Use Import Template.xls – It should be prepared this way:
      1. In row 1, create columns with the following headings:
        1. Title
        2. Author
        3. Affiliation
        4. Date (do auto fill)
        5. Abstract
        6. Subject MeSH?
        7. Publisher (auto fill)
        8. Type (auto fill) 1 Rights Management (auto fill)
        9. Publication Type (auto fill)
        10. File Name
      2. In row 2, type in Journal of Neuro-Ophthalmology, Month/Year, Volume, Issue (see attached screen shot for example)
      3. In row 3, type 'Table of Contents' YOU MUST HAVE A TOC in PDF FORMAT - ADD FILENAME in File Name column
    2. You will fill in the rest of the rows according to the article titles in page order (see screen shot)
    3. Since this will become the file that is imported into CONTENTdm, save it as a tab delimited file and place it in the archive folder specific to the issue you're working on.
    4. Go to File > Save as > browse to folder on common drive
    5. Name it roster.txt
    6. Under Save as type: choose Text (tab delimited)
    7. Say OK and Yes to the two messages
  4. Copy/paste data into roster.txt for importing and rename pdf files
    1. Go to commonfolder > NOVEL > JNO Archives > the issues you're working on
    2. Start at the top of the list and open the file
    3. Look at the author name and title
    4. Close the file
    5. Go to the Excel file with the XML data from PubMed? and look for the author and/or title (you can sort alphabetically by title if you prefer)
    6. You will copy data from here into the roster.txt file
    7. Copy the title and paste it into the title column of roster.txt
    8. Copy the author names and paste them into the author column on roster.txt
      1. If there are multiple authors, separate them with semi-colons (see screen shot)
    9. Copy affiliation and paste 1 Repeat for abstract
    10. Copy Descriptors and paste into subject.mesh column
      1. They will each go into separate rows and you need them to be in one row
      2. You will have to cut and paste them (or re-type with careful attention paid to any typos) separated by semi-colons into one row
    11. For file name, copy the title from the title field and paste it. If there are any dashes or periods, etc. take them out. Space are OK. Then add a leading number and an underscore before the title and .pdf at the end (see screen shot)
    12. Copy that to use as the file name for the pdf
    13. Go to the folder on the common drive and click once on the old file name (you don't need to open it if you don't want to). Paste in the new file name to replace the old.
    14. Repeat these for the remaining files in the folder.
  5. Prepare embedded text pdf files for importing
    1. Open Abbyy Fine Reader
      1. Go to Tools > Options
        1. Read tab
          1. Recognition mode = thorough
          2. PDF recognition = recognize PDF as image
          3. Check highlight hyperlinks, training = do not use user patterns
        2. Save tab
          1. Formats Setting > PDF
          2. Check ‘keep original image size’
          3. Save mode = text under page image
            1. File > Open PDF/Image > Browse to PDF, double click
            2. Process > Read > Read all pages
            3. Process > Save results > send all pages to > Adobe Reader/Acrobat
            4. Document opens in Adobe Acrobat Pro
      2. Replace existing file by going to File > Save as

Edit | Attach | Print version | History: r2 < r1 | Backlinks | View wiki text | Edit WikiText | More topic actions...
Topic revision: r1 - 26 May 2010 - 11:04:39 - NancyLombardo

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback