-
Notifications
You must be signed in to change notification settings - Fork 13
/
ENCODE_get_fields.py
executable file
·174 lines (128 loc) · 6.22 KB
/
ENCODE_get_fields.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
#!/usr/bin/env python3
# -*- coding: latin-1 -*-
import os.path
import argparse
import encodedcc
EPILOG = '''
To get multiple objects use the '--infile' argument
and provide a file with the list of object identifiers
%(prog)s --infile filenames.txt
this can take accessions, uuids, @ids, or aliases
To get a single object use the '--infile' argument
and use the object's identifier, also will take a comma separated list
%(prog)s --infile ENCSR000AAA
%(prog)s --infile 3e6-some-uuid-here-e45
%(prog)s --infile this-is:an-alias
%(prog)s --infile ENCSR000AAA,ENCSR000AAB
To get multiple fields use the '--field' argument
and feed it a file with the list of fieldnames
%(prog)s --field fieldnames.txt
this should be a single column file
To get a single field use the field argument:
%(prog)s --field status
%(prog)s --field status,target.title
where field is a string containing the field name
or a comma separated list of fieldnames,
(this can be combined with the embedded values)
To get embedded field values (such as target name from an experiment):
%(prog)s --field target.title
accession target.title
ENCSR087PLZ H3K9ac (Mus musculus)
this can also get embedded values from lists
%(prog)s --field files.status
*more about this feature is listed below*
To use a custom query for your object list:
%(prog)s --query www.my/custom/url
this can be used with either useage of the '--field' option
Output prints in format of fieldname:object_type for non-strings
Ex: accession read_length:int documents:list
ENCSR000AAA 31 [document1,document2]
integers ':int'
lists ':list'
string are the default and do not have an identifier
***please note that list type fields will show only unique items***
%(prog)s --field files.status --infile ENCSR000AAA
accession file.status:list
ENCSR000AAA ['released']
possible output even if multiple files exist in experiment
To show all possible outputs from a list type field
use the '--listfull' argument
%(prog)s --field files.status --listfull
accession file.status:list
ENCSR000AAA ['released', 'released', 'released']
*** ENCODE_collection useage and functionality ***
%(prog)s has ported over some functions of ENCODE_collection
and now supports the '--collection' and '--allfields' options
Useage for '--allfields':
%(prog)s --infile ENCSR000AAA --allfields
accession status files award ...
ENCSR000AAA released [/files/...] /awards/...
The '--allfields' option can be used with any of the commands,
it returns all fields at the frame=object level,
it also overrides any other --field option
Useage for '--collection':
%(prog)s --collection Experiment --status
accession status
ENCSR000AAA released
The '--collection' option can be used with or without the '--es' option
the '--es' option allows the script to search using elastic search,
which is slightly faster than the normal table view used
However, it may not posses the latest updates to the data and may not be
preferable to your application
'--collection' also overrides any other '--infile' option and so but it
can be combined with any of the '--field' or '--allfields' options
NOTE: while '--collection' should work with the '--field' field.embeddedfield
functionality I cannot guarantee speed when running due to embedded
objects being extracted
'''
def getArgs():
parser = argparse.ArgumentParser(
description=__doc__, epilog=EPILOG,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument('--infile',
help="Either the file containing a list of ENCs as a column\
or this can be a single accession by itself")
parser.add_argument('--query',
help="A custom query to get accessions.")
parser.add_argument('--field',
help="Either the file containing single column of fieldnames\
or the name of a single field")
parser.add_argument('--listfull',
help="Normal list-type output shows only unique items\
select this to list all values even repeats. Default is False",
default=False,
action='store_true')
parser.add_argument('--allfields',
help="Overrides other field options and gets all fields\
from the frame=object level. Default is False",
default=False,
action='store_true')
parser.add_argument('--collection',
help="Overrides other object options and returns all\
objects from the selected collection")
parser.add_argument('--es',
help="Used for collections, uses elastic search instead of table view",
default=False,
action='store_true')
parser.add_argument('--key',
default='default',
help="The keypair identifier from the keyfile. \
Default is --key=default")
parser.add_argument('--keyfile',
default=os.path.expanduser("~/keypairs.json"),
help="The keypair file. Default is --keyfile=%s" % (os.path.expanduser("~/keypairs.json")))
parser.add_argument('--debug',
default=False,
action='store_true',
help="Print debug messages. Default is False.")
args = parser.parse_args()
return args
def main():
args = getArgs()
key = encodedcc.ENC_Key(args.keyfile, args.key)
connection = encodedcc.ENC_Connection(key)
output = encodedcc.GetFields(connection, args)
output.get_fields()
if __name__ == '__main__':
main()