TreeLike CSV Data Browser

2013-12-27T00:06:00-05:00

TreeLike is an experimental tool allowing rapid exploration of the contents and inter-relationships across the columns of spreadsheet or CSV data. Use it to load any local CSV file. It immediately shows the distributions of values in all columns, and by laying out the columns in some nesting order, can show hierarchical relationships between values across columns. When two columns have a many-to-many relationship, the merge feature shows connections in both directions, not just from parent to child. And parent/child relationships can be reordered with one or two mouse clicks.

Demo (you’ll need a data file, how about: MySQL world data)

To understand what it does, you need to see it in action. This rough demo video will explain:

TreeLike will be most helpful with files of between about 4 and 20 columns. It should be fine with at least tens of thousands of rows, but it hasn’t been well tested.

Sequence-viz

2013-12-27T00:04:00-05:00

A collection of visualization tools for exploring sequences of events over time. Offers the first open-source, Javascript implementation of Krist Wongsuphasawat’s LifeFlow for temporal summary visualization as well as extensions and related techniques.

After cloning, remember to run

git submodule init
git submodule update

LifeFlow demo

LifeFlow stripped down demo

Merki

2013-12-27T00:03:00-05:00

Medication Extraction and Reconciliation Knowledge Instrument. A tool for extracting structured medication information from discharge summaries and other free-text narrative sources, described in this award-winning paper: Extracting Structured Medication Event Information from Discharge Summaries by Sigfried Gold, Noémie Elhadad, Xinxin Zhu, James J. Cimino, and George Hripcsak.

Citations where you may find more effective parsers built more recently. If you learn of any that are also open source, let me know!

MERKI Parser, Public Version, Documentation

Files:

ParseMeds.pm            parser code module
drugParseRules.yaml     parser rules.  can be edited if you understand regular expressions.
druglist.tsv            list of drug names, CUIs and TTY for ingredients and brand names from RxNorm
gpl.txt                 GPL3 license text
parseFromPerl.pl        example of how to call the parser from a Perl script
parseFromShell.pl       command line version.  run like:
                            echo "...tylenol 250mg po daily..." | perl parseFromShell.pl

MERKI was a sprawling, ambitious application I worked on during my time as a student of Biomedical Informatics at Columbia University. It’s purpose was to extract medication information from structured and free-text patient data, standardize and condense it, and produce a complete and concise listing of all medications mentioned in each patient’s electronic medical record. The larger project was never finished. The current files are a portable subset that allow the parsing of narrative clinical text for the extraction of structured medication information.

The following two lines from parseFromShell.pl show how to use the parser:

my $drugs = $parser->twoLevelParse($input, 
    ['drug', 'possibleDrug', 'context'], 
    ['dose', 'route', 'freq', 'prn', 'date']);  
print $parser->drugsToXML($drugs);

$parser->twoLevelParser goes over its input twice: once to extract drugs, possible drugs, and contexts; and a second time to find, within each drug or possible drug, the dose, route, frequence, prn and dates. twoLevelParser returns a Perl data structure which can then be passed to $parser->drugsToXML or $parser->drugsToHTMLTable in order to turn it into something more directly usable. Here is an example (taken from bits of random clinical text, and not meant to be clinically plausible):

unixshell$ echo "Discharge medications: Procardia XL 60 mg p.o. prn for severe wheezing, ferros sulfate 300 mg p.o. b.i.d., Cipro 250 mg p.o. q12hQ" | perl parseFromShell.pl

    
        Procardia XL
        60 mg
        p.o.
        prn for severe wheezing
        23
        69
        47
        after discharge
        Discharge meds
        discharge medications: [Procardia XL 60 mg p.o. prn for severe wheezing], ferros sulfate 300 mg p.o. b
    
    
        ferros sulfate 
        300 mg
        p.o.
        b.i.d.
        72
        104
        33
        after discharge
        Discharge meds
        p.o. prn for severe wheezing, [ferros sulfate 300 mg p.o. b.i.d.], D1DDD 250 mg p.o. q12hq
    
    
        Cipro
        250 mg
        p.o.
        q12
        107
        127
        21
        after discharge
        Discharge meds
        s sulfate 300 mg p.o. b.i.d., [Cipro 250 mg p.o. q12]hq

To understand how the parser decides what counts as a drug, a possible drug, a context, a dose, route, etc., look at these tokens in drugParseRules.yaml. The parser itself (ParseMeds.pm) treats context tokens differently than drugs and possible drugs. Any context token found becomes the context attribute of all drugs and possible drugs following it, until another context is found.

Notice that “ferros sulfate” (“ferrous sulfate” misspelled) appears as a possible drug rather than as a drug. Since it is misspelled, it is not found in the drug lexicon, but it is still identified as a possible drug because it appears before a dose, route, and frequency. (Look at the definition of possibleDrug in drugParseRules.yaml.)

This application is far from perfect, and if you do find it worth using, there is a good chance you will want to modify it for your own uses.

Changing the drug lexicon should be fairly straightforward. You can add, delete, or change entries as you like, or use an entirely different lexicon. If you change the format of the lexicon, you may need to change aspects of the parser that load and look up drugs.

You may want to change the parsing rules to catch drug phrases that the current set of rules won’t catch, or, alternatively, to make the rules more conservative to prevent false positives. You’ll need to understand the basics of the YAML format (or just follow the example of the current drugParseRules.yaml file), and, more importantly, you’ll need to understand Perl regular expressions and the special way that the parsing rules are processed. I’ll explain how the parsing rules are processed now.

Tokens are divided into terminals and non-terminals. Tokens of either sort are transformed by the parser into single Perl regular expressions. The difference is that non-terminals can include terminals and other non-terminals in their definition. You’ll also notice that the way they are written is slightly different, but that is just to make them more readable. Also, terminals can include literal text, but non-terminals cannot (because they try to interpret literal text as a reference to another token.)

Terminals are made up of a name followed by a list of expressions or pieces of literal text. Take, for instance, the terminal cond:

cond:   [ud, ut dict, prm-breakthrough, '(were|was) held', discontinued, "dc'd"]

This will be converted into the (approximately) following regular expression:

/(ud|ut dict|prn-breakthrough|(were|was) held|discontinued|dc\'d)/

Actually, two other options will be added to the list of strings it will match: “u.d.” and “ut dict.” This is because the convenience rule dotsAfterLtrOk includes “ud” and the convenience rule dotsAtEndOk includes “ut dict”.

The terminal cond is used in the non-terminal prn:

- name: prn
  patterns:
    - 'asNeeded\s*qualifier'
    - '(cond|asNeeded)'

This translates into “prn” or “as needed” (that’s how “asNeeded” is defined) followed by a qualifier (something like “for severe pain”), OR something that matches the cond token, OR something that matches the asNeeded token. Generally you will see the parsing rules composed such that more specific, longer expressions appear before less specific, shorter expressions. This is because the first expression matched will be kept and subsequent expressions will not be tried.

Finally, you may also want to modify the parsing code itself, but that code is not documented and may be hard to understand.

If you do make changes, or even if you use the code at all, I would very much appreciate hearing from you. I may be able to offer assistance, and I may be able to make your improvements available to others.

Contact Sigfried Gold with questions.

Supergroup

2013-12-27T00:00:00-05:00

D3 visualizations, Underscore mixins, etc.

TreeLike CSV Data Browser

Sequence-viz

Merki

MERKI Parser, Public Version, Documentation

Supergroup