You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
80 lines
2.8 KiB
Markdown
80 lines
2.8 KiB
Markdown
### SF Bay Area Bike Share
|
|
|
|
https://www.kaggle.com/benhamner/sf-bay-area-bike-share
|
|
|
|
stations.csv
|
|
схема:
|
|
```
|
|
id: station ID number
|
|
name: name of station
|
|
lat: latitude
|
|
long: longitude
|
|
dock_count: number of total docks at station
|
|
city: city (San Francisco, Redwood City, Palo Alto, Mountain View, San Jose)
|
|
installation_date: original date that station was installed. If station was moved, it is noted below.
|
|
```
|
|
|
|
trips.csv
|
|
схема:
|
|
```
|
|
id: numeric ID of bike trip
|
|
duration: time of trip in seconds
|
|
start_date: start date of trip with date and time, in PST
|
|
start_station_name: station name of start station
|
|
start_station_id: numeric reference for start station
|
|
end_date: end date of trip with date and time, in PST
|
|
end_station_name: station name for end station
|
|
end_station_id: numeric reference for end station
|
|
bike_id: ID of bike used
|
|
subscription_type: Subscriber = annual or 30-day member; Customer = 24-hour or 3-day member
|
|
zip_code: Home zip code of subscriber (customers can choose to manually enter zip at kiosk however data is unreliable)
|
|
```
|
|
### Stack Overflow Data Dump
|
|
|
|
https://archive.org/details/stackexchange
|
|
|
|
posts_sample.xml
|
|
|
|
```
|
|
sc.textFile("posts.xml").mapPartitions(_.take(1000)).repartition(1).saveAsTextFile("posts_sample.xml")
|
|
```
|
|
|
|
### New York City Taxi Data(2010-2013)
|
|
|
|
https://databank.illinois.edu/datasets/IDB-9610843 или https://uofi.app.box.com/v/NYCtaxidata
|
|
|
|
nyctaxi.csv
|
|
схема: https://uofi.app.box.com/v/NYCtaxidata/file/33670345557
|
|
```
|
|
_id: primary key
|
|
_rev: unknown attribute.
|
|
medallion: a permit to operate a yellow taxicab in New York City, it is effectively a (randomly assigned) car ID. See also medallions.
|
|
hack_license: a license to drive the vehicle, it is effectively a (randomly assigned) driver ID. See also hack license.
|
|
vendor_id: e.g.,Verifone Transportation Systems(VTS), or Mobile KnowledgeSystems Inc(CMT), implemented as part of theTechnology Passenger Enhancements Project.
|
|
rate_code: taximeter rate, see NYCT&L description.
|
|
store_and_fwd_flag: unknown attribute.
|
|
pickup_datetime: start time of the trip, mm-dd-yyyy hh24:mm:ss EDT.
|
|
dropoff_datetime: end time of the trip, mm-dd-yyyy hh24:mm:ss EDT.
|
|
passenger_count: number of passengers on the trip, default value is one.
|
|
trip_time in secs: trip time measured by the taximeter in seconds.
|
|
trip_distance: trip distance measured by the taximeter in miles.
|
|
pickup_longitude and pickup_latitude: GPS coordinates at the start of the trip.
|
|
dropoff_longitude and dropoff_latitude: GPS coordinates at the end of the trip.
|
|
```
|
|
|
|
|
|
nycTaxiFares.gz
|
|
nycTaxiRides.gz
|
|
схема: https://github.com/apache/flink-training/blob/master/README.md#schema-of-taxi-ride-events
|
|
|
|
### List of programming languages
|
|
|
|
https://en.wikipedia.org/wiki/List_of_programming_languages
|
|
|
|
programming-languages.csv
|
|
|
|
|
|
### Другие источники данных
|
|
|
|
https://github.com/infoculture/awesome-opendata-rus
|