Second Provenance Challenge: PASS: queries

Notes

  • In our world we have no explicit workflows or stages, so Q3 ends up being identical to Q2. Thus, we have no separate material for Q3. See our first challenge page for further discussion.
  • Q4a is like Q4, but excludes the time, so data sets where the execution wasn't on Monday still return nonempty results, which is important for testing. 

  • Q5a is like Q5, but excludes the global_maximum clause, so data sets that don't come with this information (which, judging from the comments on the pages I've seen, is most of them) still offer nonempty results. 

  • Q7a and Q7b are the two queries whose results we diff to answer query 7. We do not have any direct support for diff queries or extracting graph differences, though we intend to add it at some point. See our first challenge page for further discussion. 

  • Q9a is like Q9, but applies to the files to which the annotations specified for the second challenge are actually supplied, so as to get nonempty results.

Also note that the query syntax used herein is merely what we are currently using and is likely to change in the future, possibly drastically. The most important elements are

  • ** - all objects
  • -> - transit one step in a path
  • ->* - transit across zero or more objects in a path
  • ! - extracted element of a path

A path represents a tuple of data, where each slot in the tuple arises from a path element marked with a !. Hopefully the meaning of the rest should be fairly clear from context.

Only some of the mindswap-mindswap-mindswap queries work. These are Q1, Q2, Q4a, Q7a, and Q7b. Q5/5a and Q6 require pattern matching on names; this should be possible even on the MINDSWAP names but would be ugly to arrange in our framework. Q8 and Q9/9a are broken in any event because the annotations they require are missing from the data. They should in principle work if the annotations were to appear. Q4 is broken on a minor data incompatibility: fractional timestamps. It's readily fixable but not worth worrying about.

Per-system query prefixes

To allow using the same queries across all combinations of data sets, we prepend certain definitions to the query text based on the data sets in use. These are those definitions.

For MINDSWAP phase 1 data:

    let alignwarpName = "align_warp" in
    let alignwarpArgv = ("-m", "-12") in

For MINDSWAP phase 2 data:

    let softmeanName = "softmean" in

For MINDSWAP phase 3 data:

    let atlasXName = "ConvertExecution-1155932184.62982.owl#Graphic1155932184.62982" in
    let atlasXJpgName = "ConvertExecution-1157476801.92867.owl#Graphic1157476801.92867" in

For PASOA phase 1 data:

    let alignwarpName = "AlignWarpImpl" in
    let alignwarpArgv = ("-m", "12") in

For PASOA phase 2 data:

    let softmeanName = "SoftmeanImpl" in

For PASOA phase 3 data:

    let atlasXName = "atlas-x.gif" in
    let atlasXJpgName = "atlas-x.jpeg" in

For PASS phase 1 data:

    let alignwarpName = "align_warp" in
    let alignwarpArgv = ("-m", "12") in

For PASS phase 2 data:

    let softmeanName = "softmean" in

For PASS phase 3 data:

    let atlasXName = "atlas-x.gif" in
    let atlasXJpgName = "atlas-x.jpg" in

Query text

Query 1 (Q1)

    **! ->* (** where @NAME == atlasXName && @TYPE == "FILE")

Query 2 (Q2)

    let all = **! ->* (** where @NAME == atlasXName && @TYPE == "FILE")
    in
    let no = **! ->* (** where @NAME == softmeanName && @TYPE == "PROC") in
    let no2 = (** where @NAME == softmeanName && @TYPE == "PROC") in
    let result = (all - no) - no2
    in
      result

Query 4 (Q4)

    ** where 
      @NAME == alignwarpName && @ARGV contains alignwarpArgv && 
      (ctime(@FREEZETIME) ~ "*Mon*" || ctime(@EXECTIME) ~ "*Mon*")

Query 4a (Q4a)

    ** where 
      @NAME == alignwarpName && @ARGV contains alignwarpArgv
    

Query 5 (Q5)

    let aw = (** where @TYPE == "PROC" && @NAME == alignwarpName)
    in
    let hdrs = (** where @NAME ~ "*.hdr" && @global_maximum == 4095)! -> aw
    in
      hdrs ->* (** where 
    			@NAME ~ "atlas*.gif" || 
    			@NAME ~ "atlas*.jpg" ||
    			@NAME ~ "atlas*.jpeg")!
    

Query 5a (Q5a)

    let aw = (** where @TYPE == "PROC" && @NAME == alignwarpName)
    in
    let hdrs = (** where @NAME ~ "*.hdr")! -> aw
    in
      hdrs ->* (** where 
    			@NAME ~ "atlas*.gif" || 
    			@NAME ~ "atlas*.jpg" ||
    			@NAME ~ "atlas*.jpeg")!
    

Query 6 (Q6)

    let aw = ** where (@TYPE == "PROC" && 
    		  @NAME == alignwarpName && 
    		  @ARGV contains alignwarpArgv)
    in
    let sm = aw ->* (** where @TYPE == "PROC" && @NAME == softmeanName)!
    in
    let results = sm ->* (** where @TYPE == "FILE" && @NAME ~ "*.img")!
    in
      (for r = results in r.NAME)

Query 7a (Q7a)

    **! ->* (** where @NAME == atlasXName && @TYPE == "FILE")
    
    

Query 7b (Q7b)

    **! ->* (** where @NAME == atlasXJpgName && @TYPE == "FILE")

Query 8 (Q8)

    let warps =
       (** where @center == "UChicago") -> 
    	(** where @TYPE == "PROC" && @NAME == alignwarpName)!
    in
    let all = warps ->* (** where @TYPE == "FILE")!
    in
    let no = warps ->* (** where @TYPE == "FILE") ->* (** where @TYPE == "FILE")!
    in
      all-no

Query 9 (Q9)

    ** where 
    	   (@NAME ~ "atlas*.gif" || 
    	    @NAME ~ "atlas*.jpg" ||
    	    @NAME ~ "atlas*.jpeg") 
    	&&
    	   (@studyModality == "speech" ||
    	    @studyModality == "visual" || 
    	    @studyModality == "audio")

Query 9a (Q9a)

    ** where (@NAME ~ "anatomy*.img") &&
    	   (@studyModality == "speech" ||
    	    @studyModality == "visual" || 
    	    @studyModality == "audio")