丁香五月激情视频,少妇夫妻性生活的视频一级,成人黄色毛片网站,人人妻人人操裸贷,五月丁香激情啪啪,岳乳丰满一区二区三区,午夜三级免费福利影院,国产人妻精品

文章來源：blog.csdn.net/weixin_44169614?type=blog
推薦閱讀： 終于來了，【第三期】彭濤Python 爬蟲特訓(xùn)營！!

一、安裝環(huán)境

gym是用于開發(fā)和比較強化學(xué)習(xí)算法的工具包，在python中安裝gym庫和其中子場景都較為簡便。

安裝gym：

      
      pip?install?gym

安裝自動駕駛模塊，這里使用Edouard Leurent發(fā)布在github上的包highway-env（鏈接：https://github.com/eleurent/highway-env）：

      
      pip?install?--user?git+https://github.com/eleurent/highway-env

其中包含6個場景：

高速公路——“highway-v0”
匯入——“merge-v0”
環(huán)島——“roundabout-v0”
泊車——“parking-v0”
十字路口——“intersection-v0”
賽車道——“racetrack-v0”

詳細(xì)文檔可以參考這里：

https://highway-env.readthedocs.io/en/latest/

二、配置環(huán)境

安裝好后即可在代碼中進(jìn)行實驗（以高速公路場景為例）：

      
      import?gym
import?highway_env
%matplotlib?inline

env?=?gym.make('highway-v0')
env.reset()
for?_?in?range(3):
????action?=?env.action_type.actions_indexes["IDLE"]
????obs,?reward,?done,?info?=?env.step(action)
????env.render()

運行后會在模擬器中生成如下場景：

綠色為ego vehicle env類有很多參數(shù)可以配置，具體可以參考原文檔。

三、訓(xùn)練模型

1、數(shù)據(jù)處理

(1)state

highway-env包中沒有定義傳感器，車輛所有的state (observations) 都從底層代碼讀取，節(jié)省了許多前期的工作量。根據(jù)文檔介紹，state (ovservations) 有三種輸出方式：Kinematics，Grayscale Image和Occupancy grid。

Kinematics

輸出V*F的矩陣，V代表需要觀測的車輛數(shù)量（包括ego vehicle本身），F(xiàn)代表需要統(tǒng)計的特征數(shù)量。例：

數(shù)據(jù)生成時會默認(rèn)歸一化，取值范圍：[100, 100, 20, 20]，也可以設(shè)置ego vehicle以外的車輛屬性是地圖的絕對坐標(biāo)還是對ego vehicle的相對坐標(biāo)。

在定義環(huán)境時需要對特征的參數(shù)進(jìn)行設(shè)定：

      
      config?=?\
????{
????"observation":?
?????????{
????????"type":?"Kinematics",
????????#選取5輛車進(jìn)行觀察（包括ego?vehicle）
????????"vehicles_count":?5,??
????????#共7個特征
????????"features":?["presence",?"x",?"y",?"vx",?"vy",?"cos_h",?"sin_h"],?
????????"features_range":?
????????????{
????????????"x":?[-100,?100],
????????????"y":?[-100,?100],
????????????"vx":?[-20,?20],
????????????"vy":?[-20,?20]
????????????},
????????"absolute":?False,
????????"order":?"sorted"
????????},
????"simulation_frequency":?8,??#?[Hz]
????"policy_frequency":?2,??#?[Hz]
????}

Grayscale Image

生成一張W*H的灰度圖像，W代表圖像寬度，H代表圖像高度

Occupancy grid

生成一個WHF的三維矩陣，用W*H的表格表示ego vehicle周圍的車輛情況，每個格子包含F(xiàn)個特征。

(2) action

highway-env包中的action分為連續(xù)和離散兩種。連續(xù)型action可以直接定義throttle和steering angle的值，離散型包含5個meta actions：

      
      ACTIONS_ALL?=?{
????????0:?'LANE_LEFT',
????????1:?'IDLE',
????????2:?'LANE_RIGHT',
????????3:?'FASTER',
????????4:?'SLOWER'
????}

(3) reward

highway-env包中除了泊車場景外都采用同一個reward function：

這個function只能在其源碼中更改，在外層只能調(diào)整權(quán)重。（泊車場景的reward function原文檔里有，懶得打公式了……）

2、搭建模型

DQN網(wǎng)絡(luò)的結(jié)構(gòu)和搭建過程已經(jīng)在我另一篇文章中討論過，所以這里不再詳細(xì)解釋。我采用第一種state表示方式——Kinematics進(jìn)行示范。

由于state數(shù)據(jù)量較?。?輛車*7個特征），可以不考慮使用CNN，直接把二維數(shù)據(jù)的size[5,7]轉(zhuǎn)成[1,35]即可，模型的輸入就是35，輸出是離散action數(shù)量，共5個。

      
      import?torch
import?torch.nn?as?nn
from?torch.autograd?import?Variable
import?torch.nn.functional?as?F
import?torch.optim?as?optim
import?torchvision.transforms?as?T
from?torch?import?FloatTensor,?LongTensor,?ByteTensor
from?collections?import?namedtuple
import?random?
Tensor?=?FloatTensor

EPSILON?=?0????#?epsilon?used?for?epsilon?greedy?approach
GAMMA?=?0.9
TARGET_NETWORK_REPLACE_FREQ?=?40???????#?How?frequently?target?netowrk?updates
MEMORY_CAPACITY?=?100
BATCH_SIZE?=?80
LR?=?0.01?????????#?learning?rate

class?DQNNet(nn.Module):
????def?__init__(self):
????????super(DQNNet,self).__init__()??????????????????
????????self.linear1?=?nn.Linear(35,35)
????????self.linear2?=?nn.Linear(35,5)???????????????
????def?forward(self,s):
????????s=torch.FloatTensor(s)????????
????????s?=?s.view(s.size(0),1,35)????????
????????s?=?self.linear1(s)
????????s?=?self.linear2(s)
????????return?s???????????
?????????????????????????
class?DQN(object):
????def?__init__(self):
????????self.net,self.target_net?=?DQNNet(),DQNNet()????????
????????self.learn_step_counter?=?0??????
????????self.memory?=?[]
????????self.position?=?0?
????????self.capacity?=?MEMORY_CAPACITY???????
????????self.optimizer?=?torch.optim.Adam(self.net.parameters(),?lr=LR)
????????self.loss_func?=?nn.MSELoss()

????def?choose_action(self,s,e):
????????x=np.expand_dims(s,?axis=0)
????????if?np.random.uniform()?<?1-e:??
????????????actions_value?=?self.net.forward(x)????????????
????????????action?=?torch.max(actions_value,-1)[1].data.numpy()
????????????action?=?action.max()???????????
????????else:?
????????????action?=?np.random.randint(0,?5)
????????return?action

????def?push_memory(self,?s,?a,?r,?s_):
????????if?len(self.memory)?<?self.capacity:
????????????self.memory.append(None)
????????self.memory[self.position]?=?Transition(torch.unsqueeze(torch.FloatTensor(s),?0),torch.unsqueeze(torch.FloatTensor(s_),?0),\
????????????????????????????????????????????????torch.from_numpy(np.array([a])),torch.from_numpy(np.array([r],dtype='float32')))#
????????self.position?=?(self.position?+?1)?%?self.capacity
???????
????def?get_sample(self,batch_size):
????????sample?=?random.sample(self.memory,batch_size)
????????return?sample
??????
????def?learn(self):
????????if?self.learn_step_counter?%?TARGET_NETWORK_REPLACE_FREQ?==?0:
????????????self.target_net.load_state_dict(self.net.state_dict())
????????self.learn_step_counter?+=?1
????????
????????transitions?=?self.get_sample(BATCH_SIZE)
????????batch?=?Transition(*zip(*transitions))

????????b_s?=?Variable(torch.cat(batch.state))
????????b_s_?=?Variable(torch.cat(batch.next_state))
????????b_a?=?Variable(torch.cat(batch.action))
????????b_r?=?Variable(torch.cat(batch.reward))????
?????????????
????????q_eval?=?self.net.forward(b_s).squeeze(1).gather(1,b_a.unsqueeze(1).to(torch.int64))?
????????q_next?=?self.target_net.forward(b_s_).detach()?#
????????q_target?=?b_r?+?GAMMA?*?q_next.squeeze(1).max(1)[0].view(BATCH_SIZE,?1).t()???????????
????????loss?=?self.loss_func(q_eval,?q_target.t())????????
????????self.optimizer.zero_grad()?#?reset?the?gradient?to?zero????????
????????loss.backward()
????????self.optimizer.step()?#?execute?back?propagation?for?one?step???????
????????return?loss
Transition?=?namedtuple('Transition',('state',?'next_state','action',?'reward'))

3、運行結(jié)果

各個部分都完成之后就可以組合在一起訓(xùn)練模型了，流程和用CARLA差不多，就不細(xì)說了。

初始化環(huán)境（DQN的類加進(jìn)去就行了）：

      
      import?gym
import?highway_env
from?matplotlib?import?pyplot?as?plt
import?numpy?as?np
import?time
config?=?\
????{
????"observation":?
?????????{
????????"type":?"Kinematics",
????????"vehicles_count":?5,
????????"features":?["presence",?"x",?"y",?"vx",?"vy",?"cos_h",?"sin_h"],
????????"features_range":?
????????????{
????????????"x":?[-100,?100],
????????????"y":?[-100,?100],
????????????"vx":?[-20,?20],
????????????"vy":?[-20,?20]
????????????},
????????"absolute":?False,
????????"order":?"sorted"
????????},
????"simulation_frequency":?8,??#?[Hz]
????"policy_frequency":?2,??#?[Hz]
????}
????
env?=?gym.make("highway-v0")
env.configure(config)

訓(xùn)練模型：

      
      dqn=DQN()
count=0

reward=[]
avg_reward=0
all_reward=[]

time_=[]
all_time=[]

collision_his=[]
all_collision=[]
while?True:
????done?=?False????
????start_time=time.time()
????s?=?env.reset()
????
????while?not?done:
????????e?=?np.exp(-count/300)??#隨機選擇action的概率，隨著訓(xùn)練次數(shù)增多逐漸降低
????????a?=?dqn.choose_action(s,e)
????????s_,?r,?done,?info?=?env.step(a)
????????env.render()
????????
????????dqn.push_memory(s,?a,?r,?s_)
????????
????????if?((dqn.position?!=0)&(dqn.position?%?99==0)):
????????????loss_=dqn.learn()
????????????count+=1
????????????print('trained?times:',count)
????????????if?(count%40==0):
????????????????avg_reward=np.mean(reward)
????????????????avg_time=np.mean(time_)
????????????????collision_rate=np.mean(collision_his)
????????????????????????????????
????????????????all_reward.append(avg_reward)
????????????????all_time.append(avg_time)
????????????????all_collision.append(collision_rate)
????????????????????????????????
????????????????plt.plot(all_reward)
????????????????plt.show()
????????????????plt.plot(all_time)
????????????????plt.show()
????????????????plt.plot(all_collision)
????????????????plt.show()
????????????????
????????????????reward=[]
????????????????time_=[]
????????????????collision_his=[]
????????????????
????????s?=?s_
????????reward.append(r)??????
????
????end_time=time.time()
????episode_time=end_time-start_time
????time_.append(episode_time)
????????
????is_collision=1?if?info['crashed']==True?else?0
????collision_his.append(is_collision)

我在代碼中添加了一些畫圖的函數(shù)，在運行過程中就可以掌握一些關(guān)鍵的指標(biāo)，每訓(xùn)練40次統(tǒng)計一次平均值。

平均碰撞發(fā)生率：

epoch平均時長(s)：

平均reward：

可以看出平均碰撞發(fā)生率會隨訓(xùn)練次數(shù)增多逐漸降低，每個epoch持續(xù)的時間會逐漸延長（如果發(fā)生碰撞epoch會立刻結(jié)束）

四、總結(jié)

相比于我在之前文章中使用過的模擬器CARLA，highway-env環(huán)境包明顯更加抽象化，用類似游戲的表示方式，使得算法可以在一個理想的虛擬環(huán)境中得到訓(xùn)練，而不用考慮數(shù)據(jù)獲取方式、傳感器精度、運算時長等現(xiàn)實問題。對于端到端的算法設(shè)計和測試非常友好，但從自動控制的角度來看，可以入手的方面較少，研究起來不太靈活。

我們爬蟲第三期來了，加入我們，學(xué)更實用，更值錢的 Python 技術(shù)！

從0到1系統(tǒng)掌握Python 技術(shù)（入門進(jìn)階）

2個企業(yè)實戰(zhàn)項目，4大常用工具

掌握24種反爬策略手段，成為真正爬蟲高手

能抓取市面上90%的網(wǎng)站

掌握主流爬蟲技術(shù)，就業(yè)找工作真正全方位幫助大家從0到1，從 Python 入門到進(jìn)階，轉(zhuǎn)行找爬蟲工作。

天哪!用Python實現(xiàn)自動駕駛!

一、安裝環(huán)境

二、配置環(huán)境

三、訓(xùn)練模型

1、數(shù)據(jù)處理